Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

PDF/R-1 (Raster image transport and storage.) Based on PDF 1.4-1.7 (ISO 32000-1)

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name PDF/R-1 based on PDF 1.4-1.7 as defined in ISO 32000-1. ISO 23504-1: Document management applications -- Raster image transport and storage -- Part 1: Use of ISO 32000
Description

The PDF/R format was developed by the TWAIN Working Group and the PDF Association (PDFA.org) as a very limited subset of PDF, designed primarily for storing and transmitting scanned page data. The format is a key component of TWAIN Direct, an open protocol intended to allow applications to communicate with scanners without the need for proprietary scanner drivers. A specification for PDF/R (aka PDF/raster) was submitted for international standardization to ISO/TC 171/SC 2 and approved for publication as ISO 23504-1:2020 -- Document management applications -- Raster image transport and storage -- Part 1: Use of ISO 32000. The specification document covers two variants of PDF/R, differing in whether they are unencrypted and conform to ISO 32000-1 or are encrypted and conform to ISO 32000-2 (often referred to as PDF 2.0). This format description is focused on the former. For the differences specific to the encrypted variant of PDF/R-1 based on PDF 2.0, see PDF/R-1_enc.

Among the benefits and advantages over the full PDF format presented in the introduction to the PDF/R-1 specification are the following:

  • Files can be read and written without a full PDF parser or generator and rendered without a complex rendering engine.
  • Files can be created efficiently from raster images and generated using a fixed-size raster data buffer.
  • The exact original raster image data can be recovered.
  • Images can be located in the file and read efficiently with comparatively simple code.
  • The specification provides a precise, well-defined target, simplifying engineering design and testing.
  • PDF/R files can be quickly and easily identified as such by software. See File signifiers below.
  • PDF/R supports only effective and readily available compression algorithms.

PDF/R retains optional PDF features for protecting and authenticating content:

  • For implementations that need to protect document content at rest, there is an encrypted variant,which is based on ISO 32000-2 (PDF 2.0) in order to support stronger encryption than permitted in ISO 32000-1. See PDF/R-1_enc.
  • One or more digital signatures may be used for implementations that require verification of the document origin, authenticity, date or time of creation, and so on.

The TWAIN Working Group considered pros and cons of PDF in comparison to JPEG and TIFF before choosing PDF as the basis for the file format for use in TWAIN Direct. See the introduction to PDF/raster slideshow or presentation video from 2017, given by the chair of the TWAIN Working Group. PDF was preferred to TIFF as an actively maintained international standard with widespread viewer support. Disadvantages of JPEG were the lack of support for multi-page documents or bitonal images.

The restrictions with which a PDF/R-1 file must comply were described informally in a 2017 Overview of PDF/raster by the PDF Association. The list of restrictions below is based largely on that overview:

  • Page contents may only include images in ITU/CCITT G4 (FAX), DCT (JPEG), and uncompressed raster formats. These images are restricted to RGB color, grayscale, and bitonal formats.
  • Page contents may not include anything other than raster images: no text, no line art, no forms, or other graphical elements. [Note: This excludes the incorporation of searchable text derived by OCR.]
  • Transparency and layers are not permitted.
  • No compression of non-image data is permitted. Compressed object streams are disallowed.
  • Each page object shall contain a Contents stream that fills the required MediaBox (which defines the page size). [Note: Page-level metadata streams are allowed as well as document-level metadata.]
  • The use of annotations is limited to those used for digital signatures. Digital signatures that render a visual presentation on the page are not permitted.
  • Most optional structures are disallowed. For example, interactive actions, bookmarks, search indexes, and marked content are not permitted.
  • Versions of PDF prior to 1.4 may not be used for PDF/R files.
  • The only circumstance in which a PDF/R-1 file may include an incremental update is for the application of a digital signature.

As indicated in Annex A to the PDF/-1 specification, it is possible to create PDF/R files that also comply with PDF/A. However, the annex states that the only PDF/A profiles to which a PDF/R can conform are PDF/A-1b and PDF/A-2b; other profiles of PDF/A-1 and PDF/A-2 require searchable text.

Production phase Intended as an initial-state format, to be generated in scanner hardware, and a middle-state format for transmittal to the next stage in a scanning workflow. Note that if any processing is performed on a PDF/R file other than applying a digital signature, the file saved after that process is no longer a valid PDF/R instance. In particular, a PDF/R to which searchable text has been added, e.g., by OCR, is not a valid PDF/R.
Relationship to other formats
    Subtype of PDF_family, PDF (Portable Document Format) Family
    Has modified version PDF/R-1_enc, PDF/R-1, (Raster image transport and storage). Encrypted, based on PDF 2.0 (ISO 32000-2). Used for encrypted PDF/R-1 files.

Local use Explanation of format description terms

LC experience or existing holdings The Library of Congress has no experience with PDF/R files.
LC preference The Library of Congress Recommended Formats Statement (RFS), which expresses format preferences for content in its collections, states that XML-based markup is preferred for textual content when available. For formats that represent page layout rather than marked up text, PDF/UA and PDF/A are preferred over other PDFs. Next in order of preference are PDFs with features such as searchable text, embedded fonts, lossless compression, and high resolution images. These preferences apply to textual works in digital form, electronic serials, digital musical scores, and accompanying image/text files for digital audio.

Sustainability factors Explanation of format description terms

Disclosure

PDF/R-1 is based on the publicly documented PDF/raster specification developed through a collaboration between two international associations that develop and promote standards, the TWAIN Working Group and the PDF Association (PDFA). In April 2020, the PDF/R-1 specification was approved as ISO 23504-1 under the auspices of ISO/TC 171/SC 2.

The TWAIN Working group describes itself as "a not-for-profit organization with the sole purpose of fostering a universal public standard which links applications and image acquisition devices." See https://twain.org/. On its website, PDFA states, "A global association, the PDF Association is an initiative of the Association for Digital Document Standards (ADDS) e.V., founded in September 2006." See https://www.pdfa.org/about-us/.

According to the introduction to PDF/Raster 1.0 at PDFA, "PDF/Raster was created in a collaboration between the TWAIN Working Group, which originated the PDF/raster concept, and the PDF Association, which provided PDF technology expertise and a means of communicating with the PDF software industry at large to ensure that a diverse range of relevant viewpoints was represented."

    Documentation

The specification for the unencrypted variant of PDF/R-1, which is based on ISO 32000-1, is in the same document as the specification for an encrypted variant based on ISO 32000-2 (aka PDF 2.0). The specification as originally submitted to ISO is available from the PDF Association and also from TWAIN's pdfraster.org site. An ISO version of the specification was approved as ISO 23504-1: Raster image transport and storage -- Use of ISO 32000 (PDF/R-1) in April 2020.

Adoption

PDF/R is a relatively new specification. Adoption of PDF/R will be tightly coupled to adoption of the new TWAIN Direct protocol. An August 22, 2018 post from the Document Imaging Report site states,"adoption of TWAIN Direct has faced a chicken-and-egg type of challenge. Scanner vendors seem to be waiting for software vendors to release applications that support TWAIN Direct before embedding the technology in their devices. Of course, if no scanners support TWAIN Direct, there isn’t much driving software vendors to create connections to it." In 2018, TWAIN supplied a configuration for software development and testing by TWAIN members, using preliminary releases of products known as TWAIN Local and TWAIN Bridge. TWAIN Bridge is software for a PC that allows the TWAIN Direct specification to be used with existing scanners that support the widely used TWAIN specification now referred to as "TWAIN Classic."

In September 2019, the Twain Working Group made a public release of TWAIN Direct specifications. See announcements at PDF Association and in Markets Insider (Sept. 4, 2019).

The TWAIN Direct code repository on GitHub provides sample code for reading and writing PDF/R files.

The compilers of this resource would appreciate updated information related to the adoption of PDF/R. Comments welcome.

    Licensing and patents The introduction to the PDF/R specification states, "There is no intention herein to claim any intellectual property that is not present in the existing PDF standard, nor claim any IP that is covered therein." See PDF_family.
Transparency Depends upon compliant software tools to read. Building tools requires sophistication. However, because many of the complexities permitted in regular PDF files are not permitted in PDF/R, code for generating or rendering PDF/R files can be simpler than that for richer PDFs with more functionality.
Self-documentation Can include XMP metadata streams for metadata at document level and page level.
External dependencies A PDF/R-1 file can be rendered with any PDF viewer that can render PDFs in the chronological versions permitted in ISO 32000. The PDF/R format has been designed for generating files within scanning hardware, but might be used as a wrapper for bit-mapped images from other sources.
Technical protection considerations Encryption is not permitted for PDF/R documents that conform to ISO 32000-1. Encryption is permitted for PDF/R files that conform to ISO 32000-2, the specification for PDF 2.0. See PDF/R-1_enc.

Quality and functionality factors Explanation of format description terms

Still Image
Normal rendering PDF/R is designed for page-oriented documents. Scaling, zooming, printing are expected functionalities for PDF viewers. The quality of raster images depend on the quality of the embedded image.
Clarity (high image resolution) The bit-depth for color or grayscale images must be either 8 or 16 bits per channel. There is no explicit limit to spatial resolution for a page image. PDF/R allows for pages to be divided into strips; this might be used to facilitate scanning of large-format pages or at very high resolution.
Color maintenance May specify an ICC color profile. The recommended color space is sRGB, although other color spaces may be used. PDF readers may ignore the specified ICC profile for an RGB-based color space and default to DeviceRGB. The image XObjects in a PDF/R file may specify an Intent to guide rendering software as to how to adjust the image for display or printing on a device that does not fully support the color space specified.
Support for vector graphics, including graphic effects and typography No support for vector graphics or combination of text and images. The PDF/R variant of PDF supports only raster images as page content.
Support for multispectral bands Not designed for multispectral images.
Functionality beyond normal rendering PDF/R has support for multi-page scanned documents and custom metadata (using XMP) for the document and for individual pages.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension pdf
Used for all PDF files. See PDF_family.
Internet Media Type application/pdf
Used for all PDF files. See PDF_family.
Magic numbers %PDF-1.4
%PDF-1.5
%PDF-1.6
%PDF-1.7
From PDF/R-1 specification.
File signature %PDF-raster
The version of the PDF/R specification is identified in a comment line in the PDF trailer, located immediately before the line containing startxref. The form of the comment is %PDF-raster-x.y, where x is the major version number and y is minor version number. For the initial specification, the version is 1.0
Indicator for profile, level, version, etc. See note.  See File Signature signifier above.
Pronom PUID See note.  PRONOM has no entry for PDF/R as of June 2020.
Wikidata Title ID See note.  There is no Wikidata Title ID for PDF/R as of June 2020.

Notes Explanation of format description terms

General  
History

The original specification for PDF/R was developed by the TWAIN Working Group using the name "PDF/raster." In association with the PDF Association, the TWAIN Working Group submitted this specification in mid-2017 to ISO/TC 171/SC 2 for standardization. One of the changes introduced during the ISO process was to adopt the name "PDF/R." The specification was approved as ISO 23504-1 in April 2020.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 08/10/2021