Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

EPUB, Electronic Publication, Version 3.0 (2011). ISO/IEC TS 30135:2014

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name EPUB, Electronic Publication, Version 3.0 (2011). ISO/IEC TS 30135:2014
Description

EPUB is a format for electronic publications with reflowable text in marked up document structure with associated images for illustrations, all in a container format. Reflowable text allows the text display to be optimized for the particular display device used by the reader of the EPUB-formatted book; this is in contrast to documents with pre-determined pagination. EPUB allows publishers to control document presentation through style-sheets. EPUB, Version 3 also provides for audio, including synchronization of text and audio, and for text-to-speech synthesis. Video may be embedded, but readers need not support video. Creators must provide a still image as fallback for a video clip.

The EPUB 3.0 specification comprises an overview and four separate specification documents:

  • Open Container Format (OCF) 3.0. Specifies the mandatory container for an EPUB. OCF is based on the ZIP format.
  • EPUB Publications 3.0. Defines how components of an OPS publication are related, defining the natural reading order, and identifying alternate representations for content elements. OPF also holds document metadata. Specifies options for representation of the content of an electronic publication. Specifies "preferred vocabularies" for XML-based content components and "core media types" that include raster and vector image formats that readers must support.
  • EPUB Content Documents 3.0. Defines profiles of XHTML, SVG, and CSS for use as the primary content in EPUB publications.
  • EPUB Media Overlays 3.0. Defines a format and a processing model for synchronization of text and audio.

EPUB is the unifying term used to denote a collection of Content Documents, a Package Document, at least one Navigation Document, and other supporting files, typically in a variety of media types, including structured text and graphics, packaged in a ZIP-based OCF container that constitute a cohesive unit for publication, as defined by the EPUB standards. The container file for an EPUB complying with the EPUB 3.0 specifications has the extension .epub; changing the extension to .zip will permit exploration of the individual files.

According to 2.1 Package Document in the EPUB 3.0 Overview, an EPUB Publication includes a single XML-based Package Document. This (a) specifies all the publication's constituent content documents and other required resources, through a manifest element, (b) defines a reading order for linear consumption, through a spine element, and (c) associates publication-level metadata and navigation information. An EPUB Navigation Document in HTML5/XHTML is a required component of an EPUB Publication; it provides the basis for both machine-readable and human-readable navigation. A mandatory toc nav element defines the primary navigation hierarchy and must be consistent with the spine. Other nav elements can support navigation options familiar from printed books, such as lists of illustrations, or from electronic documents, such as marks for significant structural components. Although only one XML-based Package Document is permitted, according to 4.5 Fallbacks, multiple instances in alternate formats of a complete work can be delivered in a single EPUB file, by defining multiple rootfile elements in the OCF container file. This may be used, for example, so that a formatted graphic novel defined via a sequence of SVG pages can be accompanied by an accessible text version defined via XHTML.

Two supplementary specifications were published later by IDPF. The first, published as an informational document in 2012, specified how support for fixed-layout documents could be provided: EPUB 3 Fixed-Layout Documents. In particular, this added a property rendition:layout with permitted values "reflowable" and "pre-paginated" to be defined for the entire publication or for an inidividual content document listed in the spine. The second supplementary specification was approved as an IDPF Recommendation in 2014: EPUB Canonical Fragment Identifier (epubcfi) Specification. This defined a method for referencing arbitrary content within an EPUB Publication to support external linking to specific locations (e.g., chapters or paragraphs) in a work.

EPUB 3.0 was approved by IDPF as a Recommendation in October 2011. EPUB 3.0 was submitted to ISO/IEC JTC1/SC34 as a Draft Technical Specification via the JTC 1 fast-track procedure by the Korean national standards body and published as ISO/IEC TS 30135 parts 1-7 in 2014. Each of these seven ISO/IEC documents is identical to its IDPF equivalent; see http://idpf.org/epub/30/.

Production phase An EPUB file is likely to be used primarily as a final-state format, for dissemination to end-users
Relationship to other formats
    Subtype of EPUB_family, Electronic Publication (EPUB) File Format Family
    Has earlier version EPUB_2, EPUB, Electronic Publication, Version 2. Last minor version of EPUB, Version 2 was 2.0.1, approved in 2010. There are substantial changes between EPUB 2 and EPUB 3.0.
    Has later version EPUB_3_0_1, EPUB, Electronic Publication, Version 3.0.1 (2014). ISO/IEC 23736:2020
    May contain HTML_5, HyperText Markup Language (HTML) 5. EPUB 3.0 uses the XML syntax for HTML, i.e. the successor to XHTML.
    May contain GIF, GIF Graphics Interchange Format, Version 89a
    May contain JFIF, JFIF, JPEG File Interchange Format
    May contain PNG, Portable Network Graphics
    May contain SVG_1_1, Scalable Vector Graphics (SVG), Version 1.1. EPUB 3.0 specifies a slightly restricted version of SVG 1.1. In particular, animation objects are not permitted.
    May contain MP3_ENC, MP3 Audio Encoding. Assumed to be wrapped in the widely used de facto file format MP3_FF which wraps MP3 encoding with optional ID3 metadata blocks.
    May contain MP4_FF_2_AAC, MPEG-4 File Format, V.2, with Advanced Audio Encoding. Limited to Low Complexity audio compression. See AAC_MP4_LC, AAC (MPEG-4) Low Complexity Object.
    May contain VP8, VP8 Video Codec
    May contain MPEG-4_AVC, MPEG-4, Advanced Video Coding (Part 10) (H.264)
    May contain An EPUB 3.0 Package Document may contain one or more links to bibliographic records for the EPUB Publication in other schemas, including MARCXML, MODS, ONIX, and XMP. Such records may be included in the EPUB Container or referred to by URI.

Local use Explanation of format description terms

LC experience or existing holdings The Library of Congress has received some ebooks in EPUB 3.0 format for its collections. An open-access example can be seen at https://www.loc.gov/item/2019299120/.
LC preference

As an XML-based format using publicly documented schemas that represent the logical structure of a publication, EPUB 3.0 satisfies most of the desired characteristics for formats for textual works, if the content files are not encrypted, if the file is not subject to technological protection that inhibits long-term preservation and access, and if all content is stored within the EPUB container. Bibliographic metadata records, for example, in the ONIX schema, may optionally be included in the EPUB container or may be available through a link to an external record. The Library of Congress would want to receive or access such metadata records in conjunction with ingestion of an EPUB publication.

The Library of Congress Recommended Formats Statement (RFS) lists EPUB 3.0 as a Preferred format for Textual Works - Digital.


Sustainability factors Explanation of format description terms

Disclosure

Open standard, developed under the auspices of the International Digital Publishing Forum (IDPF).

Approved as a Technical Specification by ISO/IEC JTC1 and published as ISO/IEC TS 30135:2014 in seven parts. Within ISO and IEC, EPUB is considered by a special joint working group (ISO/IEC JTC 1/SC 34/JWG 7). JWG7 spans several ISO and IEC committees: JTC 1/SC 34 (Document description and processing languages), ISO TC 46/SC 4 (Technical interoperability), and IEC/TC 100/TA 10 (Multimedia e-publishing and e-book technologies).

    Documentation Specifications for EPUB version 3.0 from IDPF.
Adoption

Version 3.0 of EPUB introduced several features that were welcomed by publishers and users, such as better support for graphic publications and for mathematics. A minor update, EPUB 3.0.1, was published in 2014, primarily to add flexibility for publishers and for maintenance of the specification. Support by readers for rendering of fixed-layout publications was made mandatory. See EPUB 3.0.1 Changes from EPUB 3.0. The compilers of this resource have made no attempt to distinguish the level of adoption of EPUB 3.0 from that of EPUB 3.0.1. Comments welcome. These two editions of the EPUB specification have been widely adopted.

Lists of software for reading and editing EPUB documents are available on the Wikipedia entry for EPUB. The compilers of this resource have not confirmed whether the information on individual applications is up to date.

Among the earliest tools for supporting the deployment of EPUB 3.0 were EpubCheck (a free, open-source validator) and a reader plug-in for the Chrome browser from Readium, an IDPF project started to accelerate adoption of EPUB 3.0.

Before the EPUB 3.0 specification was formally published, Adobe introduced support for some EPUB 3.0 features in InDesign as part of Creative Suite 5.5 (released April 2011). This included support for Japanese scripts and embedded rich media. For current support, see Adobe InDesign User Guide: Export content for EPUB. As of early 2020, most desktop publishing and many word-processing applications provide options to export to EPUB. QuarkExpress offers options to export in reflowable or fixed-layout EPUB format. Apple provides templates designed for creating books and provides advanced book creation options in Pages; these options are based on EPUB. WordPerfect provides tools for publishing ebooks in EPUB or MOBI format. Writer2ePub (link via Internet Archive) is an extension for OpenOffice.org or LibreOffice which allows you to create an ePub file from any file format that Writer can read.

    Licensing and patents No licensing concerns for production or use of content compliant with the EPUB specifications or core media types for version 3.0.
Transparency

EPUB 3 allows the use of either UTF-8 or UTF-16 for encoding resources in an EPUB publication.

Text content must be in XHTML/HTML5, which rates highly for transparency. However, encryption is permitted. If content files are encrypted, the package file must contain the information necessary for decryption, including key and algorithm used. Embedded fonts may be obfuscated (see Notes below) which also reduces transparency.

Any interactive functionality supported by embedded Javascript will be harder to preserve for the long term than static re-flowable content.

Self-documentation

The mandatory EPUB Package Document can include unqualified Dublin Core (DCMES) metadata and readers must recognize these elements. dc:Title, dc:Identifier, and dc:Language are mandatory. If more than one title is present, titles are required to be given title-type properties, to allow for series/collection titles, subtitles, short titles, edition statements, etc. Also mandatory is the dcterms:Modified element, which is combined with dc:Identifier to act as an identifier for a particular package. A meta element may be used to define and populate other metadata elements. In addition, a link can be made to externally stored metadata records in other schemas.

The EPUB Package Document also includes a manifest of component files. A mandatory component file is an EPUB Navigation Document, which provides structural metadata to relate the various content documents through a table of contents. The mandatory spine element in the Package Document stipulates a natural reading order.

(other information about self documentation as usual.....)

Accessibility Features

EPUB files have good support for accessibility features. See EPUB (Electronic Publication) File Format Family for accessibility information.

External dependencies No dependencies for unencrypted publications. However, encrypted, protected publications usually depend for access on specific proprietary reader applications to satisfy the procedures and perform the particular decryption operations required by the DRM scheme selected by the publication's vendor.
Technical protection considerations

In addition to support for encryption of content files within the OCF container, an optional element of the OCF container format can specify digital rights management (DRM) terms and procedures. Commercially published EPUBs can be expected to protect their EPUB files with DRM. The lack of a standard for DRM has led to fragmentation in the market: different retailers use non-interoperable DRM schemes that are tied in with eBook reader devices or apps.

Embedded third-party fonts may be "obfuscated" by partial encryption. See Notes below for more information. The result is that any reader tool has to be capable of performing the decryption in order to be able to use the intended fonts.


Quality and functionality factors Explanation of format description terms

Text
Normal rendering Good support.
Integrity of document structure The logical structure of a document is an essential feature of EPUB.
Integrity of layout and display Publishers may choose to control some aspects of layout through style-sheets. However, flowable text, by definition, will break lines and paginate text differently depending on the reading platform and user choices.
Support for mathematics, formulae, etc. EPUB 3.0 can contain MathML markup.
Functionality beyond normal rendering Flowable text can adapt to reading devices with a variety of form factors. Synchronization of audio and text is supported. A pronunciation lexicon can be embedded to support text to speech renderings.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension epub
Recommended extension for the EPUB container file.
Internet Media Type application/epub+zip
From OCF specification.
Magic numbers See note.  From OCF specification:
  • The bytes “PK” will be at the beginning of the file, followed by two additional bytes from the ZIP specification: \003 \004
  • The bytes “mimetype” will be at position 30
  • actual MIME type (i.e., the ASCII string “application/epub+zip”) will begin at position 38
Indicator for profile, level, version, etc. See note.  The version of EPUB, in this case "3.0", is identified in the version attribute of the root <package> element in the .opf file, which can often be found in a directory called "OEBPS" when the contents of the .epub file is "unzipped", i.e., extracted from the ZIP archive into its component files. Note that this naming scheme, although conventional, is not required for EPUB 3.0. The official way to find the .opf file is through the mandatory META-INF/container.xml file.
Pronom PUID fmt/483
PRONOM "outline only" entry does not differentiate between EPUB versions. See http://www.nationalarchives.gov.uk/PRONOM/fmt/483.
Wikidata Title ID Q27196933
WikiData entry for EPUB 3. Covers all EPUB 3 versions. See https://www.wikidata.org/wiki/Q27196933.
Other NF00556
See https://www.archives.gov/files/lod/dpframework/id/NF00556.ttl; entry does not differentiate between EPUB versions.

Notes Explanation of format description terms

General

Among the changes in EPUB 3 from EPUB 2 is the adoption of the ZIP-based container as the only serialization for an EPUB publication. In the list of changes between EPUB 2.0.1 and EPUB 3.0, 4.1.4 Filesystem Container states, "OCF 3.0 [OCF3] only defines a single-file (ZIP-based) container, and no longer defines a 'Filesystem Container' abstraction. This change was made in conjunction with new restrictions in Publications 3.0 restricting references to remote resources in EPUB Publications to specific media types and contexts. Taken together, these changes mean that the only instantiation of an EPUB Publication defined at this time is the EPUB ZIP Container, and that EPUB files must in general contain all constituent parts of the Publication, with certain well-defined exceptions." Audio and video content may be stored remotely rather than in the EPUB container. All other content must be in the EPUB container.

Conforming EPUB reading systems must support: HTML5, XHTML, CSS, SVG, GIF, JPEG, PNG, MP3, and AAC (low complexity) in an MP4 wrapper [AAC_MP4_LC]. These are EPUB 3 Core Media Types that all Reading Systems must support. Publications may include resources of other media types, but for each such resource there must be an alternative resource of a Core Media Type, using methods defined in the EPUB specification.

Some vendors of proprietary fonts may only permit their use (and embedding) in EPUBs if the fonts are in some way bound to the particular publication and not available on the user's system for other purposes. EPUB supports a method of font obfuscation (also known as "mangling") for this purpose. Obfuscation of embedded fonts for EPUBs is achieved by modifying the first 1040 bytes using a SHA-1 digest of the publication's unique identifier, stripped of any whitespace characters.

History

The Open eBook Publication Structure or "OEB", originally produced in 1999, was the precursor to EPUB.

Version 1.0 of the Publication Structure was created in the winter, spring, and summer of 1999 by the Open eBook Authoring Group. Following the release of OEBPS 1.0, the Open eBook Forum (OeBF) was formally incorporated in January 2000. OEBPS Version 1.0.1 [OEBPS_1_0], a maintenance release, was brought out in July 2001. OEBPS Version 1.2 [OEBPS_1_2], incorporating new support for control by content providers over presentation along with other corrections and improvements, was released as a Recommended Specification in August 2002.

EPUB 2 was initially standardized in 2007. EPUB 2.0.1 was approved in 2010.

EPUB 3.0, was approved as an IDPF Recommendation in October 2011. It was substantially different from EPUB 2, both in using only a single form for textual content and in having support for audio, video, and scripted interactivity (through Javascript). Dropped in EPUB 3 were two EPUB 2 formats for text content, one based on the Digital Talking Book [DTB_2005] format and a second, based on XHTML 1.1, compatible with OEBPS_1_2. A single new encoding for textual Content Documents was based on HTML5/XHTML and CSS3, despite the fact that both of these W3C standards were still works in progress. SVG is supported for graphics and it is possible to have an EPUB 3 document whose "pages" consist only of graphics, for example for a graphic novel. Several legacy features are deprecated. Some legacy structures may be included for compatibility of EPUB 3 documents with existing EPUB 2 readers. EPUB 3 readers are expected to render publications using version 2 and version 3.


Format specifications Explanation of format description terms


Useful references

URLs

Books, articles, etc.

Last Updated: 05/08/2024