Sustainability of Digital Formats: Planning for Library of Congress Collections |
|
Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact |
Full name | EPUB, Electronic Publication, Version 3.0 (2011). ISO/IEC TS 30135:2014 |
---|---|
Description |
EPUB is a format for electronic publications with reflowable text in marked up document structure with associated images for illustrations, all in a container format. Reflowable text allows the text display to be optimized for the particular display device used by the reader of the EPUB-formatted book; this is in contrast to documents with pre-determined pagination. EPUB allows publishers to control document presentation through style-sheets. EPUB, Version 3 also provides for audio, including synchronization of text and audio, and for text-to-speech synthesis. Video may be embedded, but readers need not support video. Creators must provide a still image as fallback for a video clip. The EPUB 3.0 specification comprises an overview and four separate specification documents:
EPUB is the unifying term used to denote a collection of Content Documents, a Package Document, at least one Navigation Document, and other supporting files, typically in a variety of media types, including structured text and graphics, packaged in a ZIP-based OCF container that constitute a cohesive unit for publication, as defined by the EPUB standards. The container file for an EPUB complying with the EPUB 3.0 specifications has the extension .epub; changing the extension to .zip will permit exploration of the individual files. According to 2.1 Package Document in the EPUB 3.0 Overview, an EPUB Publication includes a single XML-based Package Document. This (a) specifies all the publication's constituent content documents and other required resources, through a manifest element, (b) defines a reading order for linear consumption, through a spine element, and (c) associates publication-level metadata and navigation information. An EPUB Navigation Document in HTML5/XHTML is a required component of an EPUB Publication; it provides the basis for both machine-readable and human-readable navigation. A mandatory toc nav element defines the primary navigation hierarchy and must be consistent with the spine. Other nav elements can support navigation options familiar from printed books, such as lists of illustrations, or from electronic documents, such as marks for significant structural components. Although only one XML-based Package Document is permitted, according to 4.5 Fallbacks, multiple instances in alternate formats of a complete work can be delivered in a single EPUB file, by defining multiple rootfile elements in the OCF container file. This may be used, for example, so that a formatted graphic novel defined via a sequence of SVG pages can be accompanied by an accessible text version defined via XHTML. Two supplementary specifications were published later by IDPF. The first, published as an informational document in 2012, specified how support for fixed-layout documents could be provided: EPUB 3 Fixed-Layout Documents. In particular, this added a property rendition:layout with permitted values "reflowable" and "pre-paginated" to be defined for the entire publication or for an inidividual content document listed in the spine. The second supplementary specification was approved as an IDPF Recommendation in 2014: EPUB Canonical Fragment Identifier (epubcfi) Specification. This defined a method for referencing arbitrary content within an EPUB Publication to support external linking to specific locations (e.g., chapters or paragraphs) in a work. EPUB 3.0 was approved by IDPF as a Recommendation in October 2011. EPUB 3.0 was submitted to ISO/IEC JTC1/SC34 as a Draft Technical Specification via the JTC 1 fast-track procedure by the Korean national standards body and published as ISO/IEC TS 30135 parts 1-7 in 2014. Each of these seven ISO/IEC documents is identical to its IDPF equivalent; see http://idpf.org/epub/30/. |
Production phase | An EPUB file is likely to be used primarily as a final-state format, for dissemination to end-users |
Relationship to other formats | |
Subtype of | EPUB_family, Electronic Publication (EPUB) File Format Family |
Has earlier version | EPUB_2, EPUB, Electronic Publication, Version 2. Last minor version of EPUB, Version 2 was 2.0.1, approved in 2010. There are substantial changes between EPUB 2 and EPUB 3.0. |
Has later version | EPUB_3_0_1, EPUB, Electronic Publication, Version 3.0.1 (2014). ISO/IEC 23736:2020 |
May contain | HTML_5, HyperText Markup Language (HTML) 5. EPUB 3.0 uses the XML syntax for HTML, i.e. the successor to XHTML. |
May contain | GIF, GIF Graphics Interchange Format, Version 89a |
May contain | JFIF, JFIF, JPEG File Interchange Format |
May contain | PNG, Portable Network Graphics |
May contain | SVG_1_1, Scalable Vector Graphics (SVG), Version 1.1. EPUB 3.0 specifies a slightly restricted version of SVG 1.1. In particular, animation objects are not permitted. |
May contain | MP3_ENC, MP3 Audio Encoding. Assumed to be wrapped in the widely used de facto file format MP3_FF which wraps MP3 encoding with optional ID3 metadata blocks. |
May contain | MP4_FF_2_AAC, MPEG-4 File Format, V.2, with Advanced Audio Encoding. Limited to Low Complexity audio compression. See AAC_MP4_LC, AAC (MPEG-4) Low Complexity Object. |
May contain | VP8, VP8 Video Codec |
May contain | MPEG-4_AVC, MPEG-4, Advanced Video Coding (Part 10) (H.264) |
May contain | An EPUB 3.0 Package Document may contain one or more links to bibliographic records for the EPUB Publication in other schemas, including MARCXML, MODS, ONIX, and XMP. Such records may be included in the EPUB Container or referred to by URI. |
LC experience or existing holdings | The Library of Congress has received some ebooks in EPUB 3.0 format for its collections. An open-access example can be seen at https://www.loc.gov/item/2019299120/. |
---|---|
LC preference |
As an XML-based format using publicly documented schemas that represent the logical structure of a publication, EPUB 3.0 satisfies most of the desired characteristics for formats for textual works, if the content files are not encrypted, if the file is not subject to technological protection that inhibits long-term preservation and access, and if all content is stored within the EPUB container. Bibliographic metadata records, for example, in the ONIX schema, may optionally be included in the EPUB container or may be available through a link to an external record. The Library of Congress would want to receive or access such metadata records in conjunction with ingestion of an EPUB publication. The Library of Congress Recommended Formats Statement (RFS) lists EPUB 3.0 as a Preferred format for Textual Works - Digital. |
Disclosure |
Open standard, developed under the auspices of the International Digital Publishing Forum (IDPF). Approved as a Technical Specification by ISO/IEC JTC1 and published as ISO/IEC TS 30135:2014 in seven parts. Within ISO and IEC, EPUB is considered by a special joint working group (ISO/IEC JTC 1/SC 34/JWG 7). JWG7 spans several ISO and IEC committees: JTC 1/SC 34 (Document description and processing languages), ISO TC 46/SC 4 (Technical interoperability), and IEC/TC 100/TA 10 (Multimedia e-publishing and e-book technologies). |
---|---|
Documentation | Specifications for EPUB version 3.0 from IDPF. |
Adoption |
Version 3.0 of EPUB introduced several features that were welcomed by publishers and users, such as better support for graphic publications and for mathematics. A minor update, EPUB 3.0.1, was published in 2014, primarily to add flexibility for publishers and for maintenance of the specification. Support by readers for rendering of fixed-layout publications was made mandatory. See EPUB 3.0.1 Changes from EPUB 3.0. The compilers of this resource have made no attempt to distinguish the level of adoption of EPUB 3.0 from that of EPUB 3.0.1. Comments welcome. These two editions of the EPUB specification have been widely adopted. Lists of software for reading and editing EPUB documents are available on the Wikipedia entry for EPUB. The compilers of this resource have not confirmed whether the information on individual applications is up to date. Among the earliest tools for supporting the deployment of EPUB 3.0 were EpubCheck (a free, open-source validator) and a reader plug-in for the Chrome browser from Readium, an IDPF project started to accelerate adoption of EPUB 3.0. Before the EPUB 3.0 specification was formally published, Adobe introduced support for some EPUB 3.0 features in InDesign as part of Creative Suite 5.5 (released April 2011). This included support for Japanese scripts and embedded rich media. For current support, see Adobe InDesign User Guide: Export content for EPUB. As of early 2020, most desktop publishing and many word-processing applications provide options to export to EPUB. QuarkExpress offers options to export in reflowable or fixed-layout EPUB format. Apple provides templates designed for creating books and provides advanced book creation options in Pages; these options are based on EPUB. WordPerfect provides tools for publishing ebooks in EPUB or MOBI format. Writer2ePub (link via Internet Archive) is an extension for OpenOffice.org or LibreOffice which allows you to create an ePub file from any file format that Writer can read. |
Licensing and patents | No licensing concerns for production or use of content compliant with the EPUB specifications or core media types for version 3.0. |
Transparency |
EPUB 3 allows the use of either UTF-8 or UTF-16 for encoding resources in an EPUB publication. Text content must be in XHTML/HTML5, which rates highly for transparency. However, encryption is permitted. If content files are encrypted, the package file must contain the information necessary for decryption, including key and algorithm used. Embedded fonts may be obfuscated (see Notes below) which also reduces transparency. Any interactive functionality supported by embedded Javascript will be harder to preserve for the long term than static re-flowable content. |
Self-documentation |
The mandatory EPUB Package Document can include unqualified Dublin Core (DCMES) metadata and readers must recognize these elements. dc:Title, dc:Identifier, and dc:Language are mandatory. If more than one title is present, titles are required to be given title-type properties, to allow for series/collection titles, subtitles, short titles, edition statements, etc. Also mandatory is the dcterms:Modified element, which is combined with dc:Identifier to act as an identifier for a particular package. A meta element may be used to define and populate other metadata elements. In addition, a link can be made to externally stored metadata records in other schemas. The EPUB Package Document also includes a manifest of component files. A mandatory component file is an EPUB Navigation Document, which provides structural metadata to relate the various content documents through a table of contents. The mandatory spine element in the Package Document stipulates a natural reading order. (other information about self documentation as usual.....) Accessibility Features EPUB files have good support for accessibility features. See EPUB (Electronic Publication) File Format Family for accessibility information. |
External dependencies | No dependencies for unencrypted publications. However, encrypted, protected publications usually depend for access on specific proprietary reader applications to satisfy the procedures and perform the particular decryption operations required by the DRM scheme selected by the publication's vendor. |
Technical protection considerations |
In addition to support for encryption of content files within the OCF container, an optional element of the OCF container format can specify digital rights management (DRM) terms and procedures. Commercially published EPUBs can be expected to protect their EPUB files with DRM. The lack of a standard for DRM has led to fragmentation in the market: different retailers use non-interoperable DRM schemes that are tied in with eBook reader devices or apps. Embedded third-party fonts may be "obfuscated" by partial encryption. See Notes below for more information. The result is that any reader tool has to be capable of performing the decryption in order to be able to use the intended fonts. |
Text | |
---|---|
Normal rendering | Good support. |
Integrity of document structure | The logical structure of a document is an essential feature of EPUB. |
Integrity of layout and display | Publishers may choose to control some aspects of layout through style-sheets. However, flowable text, by definition, will break lines and paginate text differently depending on the reading platform and user choices. |
Support for mathematics, formulae, etc. | EPUB 3.0 can contain MathML markup. |
Functionality beyond normal rendering | Flowable text can adapt to reading devices with a variety of form factors. Synchronization of audio and text is supported. A pronunciation lexicon can be embedded to support text to speech renderings. |
Tag | Value | Note |
---|---|---|
Filename extension | epub |
Recommended extension for the EPUB container file. |
Internet Media Type | application/epub+zip |
From OCF specification. |
Magic numbers | See note. | From OCF specification:
|
Indicator for profile, level, version, etc. | See note. | The version of EPUB, in this case "3.0", is identified in the version attribute of the root <package> element in the .opf file, which can often be found in a directory called "OEBPS" when the contents of the .epub file is "unzipped", i.e., extracted from the ZIP archive into its component files. Note that this naming scheme, although conventional, is not required for EPUB 3.0. The official way to find the .opf file is through the mandatory META-INF/container.xml file. |
Pronom PUID | fmt/483 |
PRONOM "outline only" entry does not differentiate between EPUB versions. See http://www.nationalarchives.gov.uk/PRONOM/fmt/483. |
Wikidata Title ID | Q27196933 |
WikiData entry for EPUB 3. Covers all EPUB 3 versions. See https://www.wikidata.org/wiki/Q27196933. |
Other | NF00556 |
See https://www.archives.gov/files/lod/dpframework/id/NF00556.ttl; entry does not differentiate between EPUB versions. |
General |
Among the changes in EPUB 3 from EPUB 2 is the adoption of the ZIP-based container as the only serialization for an EPUB publication. In the list of changes between EPUB 2.0.1 and EPUB 3.0, 4.1.4 Filesystem Container states, "OCF 3.0 [OCF3] only defines a single-file (ZIP-based) container, and no longer defines a 'Filesystem Container' abstraction. This change was made in conjunction with new restrictions in Publications 3.0 restricting references to remote resources in EPUB Publications to specific media types and contexts. Taken together, these changes mean that the only instantiation of an EPUB Publication defined at this time is the EPUB ZIP Container, and that EPUB files must in general contain all constituent parts of the Publication, with certain well-defined exceptions." Audio and video content may be stored remotely rather than in the EPUB container. All other content must be in the EPUB container. Conforming EPUB reading systems must support: HTML5, XHTML, CSS, SVG, GIF, JPEG, PNG, MP3, and AAC (low complexity) in an MP4 wrapper [AAC_MP4_LC]. These are EPUB 3 Core Media Types that all Reading Systems must support. Publications may include resources of other media types, but for each such resource there must be an alternative resource of a Core Media Type, using methods defined in the EPUB specification. Some vendors of proprietary fonts may only permit their use (and embedding) in EPUBs if the fonts are in some way bound to the particular publication and not available on the user's system for other purposes. EPUB supports a method of font obfuscation (also known as "mangling") for this purpose. Obfuscation of embedded fonts for EPUBs is achieved by modifying the first 1040 bytes using a SHA-1 digest of the publication's unique identifier, stripped of any whitespace characters. |
---|---|
History |
The Open eBook Publication Structure or "OEB", originally produced in 1999, was the precursor to EPUB. Version 1.0 of the Publication Structure was created in the winter, spring, and summer of 1999 by the Open eBook Authoring Group. Following the release of OEBPS 1.0, the Open eBook Forum (OeBF) was formally incorporated in January 2000. OEBPS Version 1.0.1 [OEBPS_1_0], a maintenance release, was brought out in July 2001. OEBPS Version 1.2 [OEBPS_1_2], incorporating new support for control by content providers over presentation along with other corrections and improvements, was released as a Recommended Specification in August 2002. EPUB 2 was initially standardized in 2007. EPUB 2.0.1 was approved in 2010. EPUB 3.0, was approved as an IDPF Recommendation in October 2011. It was substantially different from EPUB 2, both in using only a single form for textual content and in having support for audio, video, and scripted interactivity (through Javascript). Dropped in EPUB 3 were two EPUB 2 formats for text content, one based on the Digital Talking Book [DTB_2005] format and a second, based on XHTML 1.1, compatible with OEBPS_1_2. A single new encoding for textual Content Documents was based on HTML5/XHTML and CSS3, despite the fact that both of these W3C standards were still works in progress. SVG is supported for graphics and it is possible to have an EPUB 3 document whose "pages" consist only of graphics, for example for a graphic novel. Several legacy features are deprecated. Some legacy structures may be included for compatibility of EPUB 3 documents with existing EPUB 2 readers. EPUB 3 readers are expected to render publications using version 2 and version 3. |
|