Sustainability of Digital Formats: Planning for Library of Congress Collections
|Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact|
|Full name||EPUB, Electronic Publication, Version 2|
EPUB is a format for electronic publications with reflowable text in marked up document structure with associated images for illustrations, all in a container format. Reflowable text allows the text display to be optimized for the particular display device used by the reader of the EPUB-formatted book; this is in contrast to documents with pre-determined pagination. EPUB allows publishers to control document presentation through style-sheets. EPUB, Version 2 comprises three separate specifications:
EPUB is the unifying term used to denote a collection of OPS Documents, an OPF Package file, and other files, typically in a variety of media types, including structured text and graphics, packaged in an OCF container that constitute a cohesive unit for publication, as defined by the EPUB standards. The container file for an EPUB has the extension .epub.
EPUB 2 was initially standardized in 2007. EPUB 2.0.1 was approved in 2010.
|Production phase||An EPUB file is likely to be used primarily as a final-state format, for dissemination to end-users|
|Relationship to other formats|
|Subtype of||EPUB_family, Electronic Publication (EPUB) File Format Family|
|Subtype of||OCF (Open Container Format) 2.01, based on version 6.3.0 of the ZIP archiving format.|
|Contains||OPF (Open Packaging Format) 2.01, not described separately at this site.|
|May contain||OEBPS_1_2, Open eBook Forum Publication Structure 1.2|
|May contain||DTB_2005, DTB (Digital Talking Book), 2005|
|May contain||GIF, GIF Graphics Interchange Format, Version 89a|
|May contain||JFIF, JFIF, JPEG File Interchange Format|
|May contain||PNG, Portable Network Graphics|
|May contain||SVG_1_1, Scalable Vector Graphics (SVG), Version 1.1|
|May contain||Other XML-based content types, including XML "islands" containing XML chunks based on non-preferred schemas or DTDs.|
|Has later version||EPUB_3_0, EPUB, Electronic Publication, Version 3.0 (2011). ISO/IEC TS 30135-1:2014|
|LC experience or existing holdings||Preserving eBooks (2014) by Amy Kirchhoff and Sheila Morrissey [DPC Technology Watch Report 14-01] included the Library of Congress as one of four case-studies. According to that report, the Library of Congress reported at the time that, "Collections include files in PDF, HTML/XHTML, XML/TEI, and EPUB2 formats (LOC expects soon to have EPUB3 as well)." See also EPUB _3_0.|
As an XML-based format using publicly documented schemas that represent the logical structure of a publication, EPUB_2 satisfies most of the desired characteristics for formats for textual works, if the content files are not encrypted and the file is not subject to technological protection that inhibits long-term preservation and access.
See the Recommended Formats Statement for the Library of Congress format preferences for textual works in digital form.
|Disclosure||Open standard, developed under the auspices of the International Digital Publishing Forum (IDPF). No longer under active maintenance.|
|Documentation||Specifications for EPUB version 2.0.1 from IDPF.|
EPUB 3 superseded EPUB 2.0.1 in October 2011. By September 2014, EPUB 2 was considered obsolete. The information below relates to the adoption of EPUB 2 as of March 2011. The compilers of this resource have not investigated whether the tools mentioned are still available and still support version 2 of EPUB.
As of March 2011, it is clear that the EPUB specification has filled a consumer need for a reflowable text format that can be used with a variety of hardware and software readers, so that a purchased eBook can be read on more than one reading device owned by the purchaser. Some vendors feel that this portability will appeal to consumers, in contrast to proprietary formats that can only be read on devices supported by the vendor. EPUB is also popular as a distribution format for transcribed books out of copyright. Although the format is also sometimes used for books digitized by scanning and OCR, it is unsatisfactory unless the OCR quality is very high.
As of March 2011, hardware reading devices supporting the EPUB format include: iPad, iPhone, Sony, Nook, Kobo, and Android devices.
Software EPUB readers include: add-ons for browsers; Adobe Digital Editions; ibisReader; and web-based readers, such as Bookworm and BookGlutton. See EPUB eBook Readers from epubbooks.com for a list of reading applications and devices. Some readers can only handle files not protected by digital rights management (DRM).
Publishing software for producing EPUB publications includes: Adobe InDesign and Content Server; Calibre; free tools to convert to EPUB_2 from various formats, including TEI, Microsoft Word, and RTF at http://code.google.com/p/epub-tools/; and Sigil, an open source ePub/eBook editor.
Despite all the strong signs of adoption, the fact that work started on version 3 of EPUB so soon, and that it will be significantly different, might indicate that some active supporters of EPUB are looking for a different balance between publisher control and user convenience (flexibility, simplicity and cost-effectiveness). Alternatively, it might simply indicate that the technology for electronic publications is in flux. Embedded rich media content, which will be supported in EPUB, Version 3, will provide more of a challenge to developers of free software.
|Licensing and patents||No licensing concerns for production or use of content compliant with the EPUB specifications or core media types for Version 2.|
|Transparency||Text content must be in XML or XHTML, which rates highly for transparency. However, encryption is permitted. If content files are encrypted, the package file must contain the information necessary for decryption, including key and algorithm used. If used, embedded fonts may be obfuscated (see Notes below) which also reduces transparency.|
|Self-documentation||The OPF packaging file can include unqualified Dublin Core (DCMES) metadata. It also provides structural metadata to relate the various content documents through a table of contents and to stipulate a natural reading order.|
|External dependencies||No dependencies for unencrypted publications. However, encrypted, protected publications usually depend for access on specific proprietary reader applications to satisfy the procedures and perform the particular decryption operations required by the DRM scheme selected by the publication's vendor.|
|Technical protection considerations||
In addition to support for encryption of content files within the OCF container, an optional element of the OCF container format can specify digital rights management terms and procedures. In practice, as of March 2011, most purchased EPUBs were protected by Adobe's ADEPT DRM scheme.
|Normal rendering||Good support.|
|Integrity of document structure||The logical structure of a document is an essential feature of EPUB.|
|Integrity of layout and display||Publishers may choose to control some aspects of layout through style-sheets. However, flowable text, by definition, will break lines and paginate text differently depending on the reading platform and user choices.|
|Support for mathematics, formulae, etc.||Not supported.|
|Functionality beyond normal rendering||Flowable text can adapt to reading devices with a variety of form factors.|
||Recommended extension for the EPUB publication file in its container format.|
|Internet Media Type||application/epub+zip
||From OCF specification. See also IANA registration associated with EPUB 3.01, which mentions that OCF 2.01 also uses this media type.|
|Magic numbers||See note.||From OCF 2.01 specification:
|Indicator for profile, level, version, etc.||See note.||The version of EPUB, in this case "2.0", is identified in the version attribute of the root <package> element in the .opf file, which can be found in the OEBPS directory when the contents of the .epub file is "unzipped", i.e., extracted from the ZIP archive into its component files.|
||PRONOM entry does not differentiate between versions of EPUB. See http://www.nationalarchives.gov.uk/PRONOM/fmt/483.|
|Wikidata Title ID||Q56230180
||Wikidata entry for EPUB 2. Covers all editions of EPUB 2. See https://www.wikidata.org/wiki/Q56230180.|
Conforming EPUB reading systems must support: XML, XHTML, DTBOOK (including NCX), OEBPS_1_2, CSS, GIF, JPEG, PNG, and SVG. These are OPS Core Media Types that all Reading Systems must support and publications may include. Publications may include resources of other media types, but for each such resource there must be an alternative resource of an OPS Core Media Type using methods defined in this specification or the OPF specification.
Some vendors of proprietary fonts may only permit their use (and embedding) in EPUBs if the fonts are in some way bound to the particular publication and not available on the user's system for other purposes. EPUB supports a method of font obfuscation (also known as "mangling") for this purpose. Obfuscation of embedded fonts for EPUBs is achieved by modifying the first 1040 bytes using a SHA-1 digest of the publication's unique identifier.
The Open eBook Publication Structure or "OEBPS", originally produced in 1999, was the precursor to OPS.
Version 1.0 of the Publication Structure was created in the winter, spring, and summer of 1999 by the Open eBook Authoring Group. Following the release of OEBPS 1.0, the Open eBook Forum (OeBF) was formally incorporated in January 2000. Version 1.0.1, a maintenance release, was brought out in July 2001. OEBPS Version 1.2, incorporating new support for control by content providers over presentation along with other corrections and improvements, was released as a Recommended Specification in August 2002.
EPUB 2 was initially standardized in 2007. EPUB 2.0.1 was approved in 2010.
EPUB, Version 3, was approved as an IDPF Recommendation in October 2011. It is substantially different from EPUB, Version 2. Many existing features are dropped, including the use of the Digital Talking Book DTB_2005 as a document content format. The preferred content format for textual content in EPUB_3 is the XHTML serialization of HTML5. New features include support for rich media and MathML. The talking book functionality is replaced by a more general SMIL-based mechanism for media overlays and support for text-to-speech pronunciation hints.
By September 2014, EPUB 2 was considered obsolete.