Sustainability of Digital Formats: Planning for Library of Congress Collections |
|
![]() |
|
Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact |
Full name | Mobipocket File Format |
---|---|
Description |
The Mobipocket file format (originally with extension .prc and later .mobi) is a proprietary, partially documented, binary format for ebooks. It was introduced by Mobipocket SA in 2000 as a binary format for the distribution of ebooks as reflowable text and thus usable on devices of different sizes, including handheld devices. Mobipocket SA, which offered a service for distributing ebooks based on the Mobipocket file format, was acquired by Amazon in 2005. In association with the launch of the Kindle in 2007, Amazon continued to use the format, with the .mobi extension, for use in its digital publishing services and developed tools to support its use in publishing workflows. The features Amazon added to the format were not documented, but the tools were widely used and other ebook distributors supported it. Files with the .mobi extension appear to have been the primary means of uploading ebooks for publication through Amazon until 2017. In 2017, Amazon formally introduced new "Kindle Create" tools to its Kindle Direct Publishing service. The new tools do not use a file with the .mobi extension. The Mobipocket file format, initially referred to as the XDOC PRC format, is based on a format, known as PRC, in use in 2000 for documents on Palm Personal Digital Assistants (PDAs). Mobipocket, and later Amazon, essentially used PRC as a binary wrapper for content complying to the Open eBook Publication Structure (OeBPS) standard and its successor EPUB formats, through EPUB 3. A document in one of these source formats comprises a cluster of files: one or more HTML files for the book's textual content, optional image files, and a mandatory "OPF" document in XML that provides a manifest for all the component files in the package and other data to support presentation of a table of contents and publication metadata. Mobipocket and, later, Amazon provided free tools for authors and publishers to compile the source OeBPS (aka OPF) or EPUB into the binary wrapper, with options for compression, encryption, and digital rights management (DRM). The process also included optimization of the packaged file for faster performance on reading devices and indexing to enable search features. The companies provided user guides but no documentation on the format of the output .prc or .mobi file. Although the format continued to be based on the same wrapper, the internal details changed over time and varied according to which tool was used to create the file. The most significant tools are listed below. The initial link provided for each tool is to a web page as it appeared shortly after release of the tool.
For more on the history of the Mobipocket format, see History Notes and a sequence of web page captures by the Internet Archive from Mobipocket and Amazon websites. The bottom line is that the exact form of files with the .mobi extension will vary according to the tools used to create them and the options chosen. Some workflows will have used .mobi as a distribution format for end users. In other workflows, it will have been an exchange format used by an author or publisher to submit ebooks to Amazon. Tools based on reverse engineering, such as KindleUnpack and Calibre, are able to identify different variants and extract the component parts from .mobi files without technical protection. Distributors of free ebooks for public domain, out-of-copyright, or promotional books have adopted the format, perhaps because it is easy to use KindleGen in a workflow that creates a .mobi file automatically from an EPUB file and thus makes content available for a wide variety of readers. For example, the Internet Archive has scanned out-of-copyright books for many libraries and makes them accessible in a number of formats, including EPUB and .mobi (labeled as Kindle). According to Upcoming changes in epub generation, a blog post from April 2015, the EPUBs are now generated on-the-fly from OCR output for the scanned pages and the .mobi (Kindle) files are also generated on the fly. For an example, see "Bank Robbers" at https://archive.org/details/bankrobbers00grif/; metadata extracted from its Kindle .mobi file indicates that it was produced by KindleGen 1.0. An example from Project Gutenberg at http://www.gutenberg.org/ebooks/58853 is available as a Kindle file that is a combination .mobi file, with metadata indicating that it was generated by KindleGen 2.9. Common to all .mobi files is the header structure. The file starts with a 16-byte header with basic information needed to open the file, such as codes for compression and encryption schemes applied. Reverse engineering (see MOBI on MobileRead Wiki) has revealed that this is followed by what is referred to as Record 0, which comprises a MOBI header, an optional EXTH header, followed by the full title of the ebook. Record 0 is followed by all the rest of the content, starting with the basic mobipocket data usable by all readers. In the context of combination .mobi files, this basic content is sometimes referred to as KF7 (to distinguish from KF8) or Mobi 6 (because 6 is the version of Mobi declared in the actual combination .mobi file). If the file is a combination .mobi, the data for KF8 and any other optional content will follow. Any reader that only understands the older, basic format will simply ignore the later data. See the British Library's Mobipocket Format Preservation Assessment, released initially in February 2018, for an assessment of the Mobipocket file format with regard to long-term preservation risks and the practicalities of preserving data in the format. |
Production phase | Originally a final-state format for distribution of ebooks (often with DRM or password protection) to end users. Later used as a middle-state format for preparing ebooks for Amazon's Kindle Publishing service. These files with the .mobi extension would include the same content in more than one version. End-users downloading from Amazon would receive an ebook in the most appropriate format for their Kindle device. |
Relationship to other formats | |
Modification of | DOC file format for Palm Personal Digital Assistant, not described at this website at this time. See Format Specifications below. |
May contain | OeBPS 1.0, OeBPS (Open Ebook Forum Publication Structure) 1.0. Content from an Open Ebook Publication may have been compiled into the binary Mobipocket format with either .prc or .mobi extension. |
May contain | EPUB 3, Content in any chronological version of EPUB through EPUB 3 may have been compiled into a Mobipocket file with .mobi extension. |
LC experience or existing holdings | As of early 2019, the Library of Congress has approximately 23,000 .mobi files created when books (mainly out-of-copyright) were digitized for the Library by the Internet Archive. These files were generated automatically via EPUB files, themselves generated from OCR of scanned page images. The Library does not currently provide direct access to these .mobi files; instead its catalog records link to the Internet Archive where a .mobi download (created on the fly from OCR output) is available via the "Kindle" option. Example: https://lccn.loc.gov/25003682 with link to http://hdl.loc.gov/loc.gdc/scd0001.00139838658. |
---|---|
LC preference | The Mobipocket format is not included as preferred or acceptable in Recommended Formats Statement | Textual Works, Digital as of early 2019. Preferred formats include EPUB3 and PDF/A. |
Disclosure |
The Mobipocket file format was proprietary to Mobipocket SA from 2000 to 2005 and then to Amazon. The Mobipocket website provided only partial information about the binary format, enough to support publishers of ebooks using its software tools for creating ebooks. The format was declared to be based on the PRC format used in the Palm OS. PRC is a generic container; Mobipocket did not document exactly how its format employed the PRC structure. Palm provided partial information about its PRC and PDB formats. Neither Mobipocket nor Palm provided details about the compression or encryption that could be applied, although a 2000 presentation for publishers on the Mobipocket site made it clear that these steps were part of the process for creating Mobipocket files. Amazon did not provide additional documentation about the binary format after acquiring Mobipocket in 2005 or when it updated the tools to generate combination .mobi files in late 2011. |
---|---|
Documentation |
Documentation for binary format: Minimal public documentation was provided for the binary .mobi file format by either Mobipocket or Amazon. What documentation existed remained on the Mobipocket site until 2016, including that the binary format was a modified DOC file for the Palm OS used on PALM PDAs. See Format Specifications below for links to very limited documentation from the Mobipocket and Palm OS websites and to information compiled as a result of reverse engineering. Documentation for source format: The information provided by Mobipocket in Mobipocket file format documentation was mainly about the source content used to derive the binary distribution files and is useful for assessing the functionality that the binary format could support in a reading device. The source format was based on the Open eBook Publication Structure, an open standard (see OeBPS. Mobipocket documented the Open eBook HTML tags supported in the format. Amazon updated the tools to use various EPUB versions, through EPUB 3, and provided guidance, including lists of supported HTML tags in, e.g., HTML Tags Supported (from 2007); List of supported HTML tags and CSS elements (from 2012); and List of supported HTML tags and CSS elements (from 2018). |
Adoption |
As of early 2019, all generations of Kindle readers can read .mobi files if not protected by DRM. However, older Kindles will not necessarily display newer ebooks as intended by the author/publisher, if recent feature enhancements have been used. For example, in 2011, Amazon introduced the KF8 format and started including it in combination .mobi files. KF8 added many new formatting capabilities and allowed embedded fonts. Kindles from before 2011 may display the book with simpler formatting. The Mobipocket file format was a ground-breaking format for ebooks. As of early 2019, it is typically included in any comparison of ebook formats, along with EPUB and PDF. See Useful References below. A widely recommended free tool for opening .mobi files is Calibre, an open-source ebook manager with support for viewing, converting, and cataloging ebooks in major ebook formats. Other free programs that can open .mobi files are listed in What Is a MOBI File? from LifeWire, including Stanza Desktop, Sumatra PDF, Mobi File Reader, FBReader, Okular, and Mobipocket Reader. MobiComparison from MobileRead Wiki (last updated in 2012) notes that not all readers support all features. Online converters supporting Mobipocket files include: DocsPal, Zamzar, and AConvert.com. KindleUnpack will unpack most .mobi files created by KindleGen (and without DRM) into its original components, although not with the original filenames, which are lost in the compilation process. As of early 2019, Amazon's KindleGen page is still online and provides access to KindleGen 2.9. EPUB editors, such as Sigil, which has a KindleGen plugin, can be used to prepare .mobi files for submission to Amazon. Calibre also includes an EPUB editor; see Calibre: Editing e-books. Some services that offer inexpensive book design support to authors use KindleGen to create combination .mobi files. For example, see FAQ topic from Pressbooks. In its 2018 preservation assessment for the Mobipocket format, the British Library noted, "Few memory institutions appear to be collecting e-books in Mobipocket format at the present time. Most seem to be primarily focused on other formats, such as EPUB or PDF." The British Library's own collection of ebooks received through Non-Print Legal Deposit (NPLD) also consists mainly of EPUB 2 and PDF. The Mobipocket format is used as a distribution format by a number of services providing access to ebooks, including the Internet Archive, Project Gutenberg, and ManyBooks.net. |
Licensing and patents | The .mobi file format is proprietary and Amazon has updated its internal details to support new features of Kindle devices. However, the compilers of this resource have found no evidence that Amazon attempts to impose any form of licensing on the format per se. Amazon does impose license terms on related software tools it makes available to authors and publishers of ebooks. Comments welcome. |
Transparency |
The Mobipocket format has supported compression since its introduction in 2000 [see The XDOC PRC File Format]. Mobipocket Creator for Publishers offered compression options, apparently identical between 2005 and 2015. KindleGen offered the same options. |
Self-documentation | Both OeBPS and EPUB can store metadata elements from the Dublin Core Metadata Element Set (DCMES). As described in Description above, starting at the 17th byte, the .mobi file contains a record often referred to as Record 0. Record 0 for an ebook usually consists of a MOBI header section and an EXTH header section followed by the full "name" of the book, i.e. dc:Title. Reverse engineering has identified types of EXTH entry that correspond to most other DCMES elements: dc:Creator; dc:Description; dc:Identifier (using ISBN); dc:Date; dc:Contributor; dc:Rights; dc:Subject; dc:Type; dc:Language; and dc:Source. An EXTH header section with some of these elements has been found to exist in all ebooks in .mobi format examined by the compilers of this resource. Comments welcome. |
External dependencies | None beyond software or device that can read an ebook in this format. Because the internal content of a .mobi file has varied with new features introduced by Amazon, older viewers may not successfully display all .mobi files as intended by creators/authors. Comments welcome. |
Technical protection considerations | The Mobipocket format has supported encryption since its introduction in 2000 [see The XDOC PRC File Format]. Mobipocket Creator for Publishers offered security and encryption options, apparently identical between 2005 and 2015. |
Text | |
---|---|
Normal rendering | The binary Mobipocket file format is not designed as an editable text format. The characteristics of the textual content in a Mobipocket file are governed by the underlying OeBPS or EPUB document, both of which typically have text in a declared character encoding (commonly UTF-8) and organized in reading order. Reader devices and software can support text search and extraction. |
Integrity of document structure | Representation of the logical structure of a document is an essential feature of an OeBPS or EPUB document. This structure is retained in the binary format. Reader devices and software can take advantage of the structure to support navigation. Note: .mobi files based on EPUBs derived automatically from OCR will often have little or no structure. The introduction in 2011 of support for EPUB 3 based on HTML5 and CSS, provided a richer navigation structure. Compare Open-eBook HTML tags from Mobipocket in 2005 with List of supported HTML tags and CSS elements | major enhancements in KF8 from Amazon in 2012. |
Integrity of layout and display | The Mobipocket format was designed primarily for reflowable text and to allow the user to select font family and size. Hence the author does not fully control the layout seen by the user. The basic Mobipocket format (as created prior to the 2011 introduction of KindleGen 2 and the KF8 format, or included as the Mobi 6 format in a combination .mobi generated by KindleGen 2.x) was very limited. See Mobi | Format limitations from the MobileRead Wiki. KF8 introduced support for HTML5 and CSS3, adding over 150 new formatting capabilities including embedded fonts, drop caps, and author control over line spacing, alignment, justification, margin, color, etc. See Kindle Format 8 from early 2012. KF8 also introduced fixed layout capabilities aimed at graphic novels, comics, and children’s books. |
Support for mathematics, formulae, etc. | The only way to include equations and formulae in Mobipocket files (including KF8) is as images. In Math on Kindle: How to make equations and figures look good on any Kindle device or app, Jack Lewis gives advice on how to prepare bitmaps or SVG for effective display of mathematics in .mobi files. |
Functionality beyond normal rendering |
Early Mobipocket files could support interactivity, for example, to support quizzes. See Notes below. The combination .mobi files introduced by Amazon in late 2011 have the capability to incorporate more than one version of the same content: the basic Mobipocket format (identified internally as Mobi version 6); the richer KF8 format (identified internally as Mobi version 8); and the source format processed by KindleGen 2.x, embedded as a ZIP file and identified by "SRCS." According to MOBI: Compilation Records from MobileRead Wiki, KindleGen had an undocumented option to suppress the SRCS record that was removed in 2010. |
Tag | Value | Note |
---|---|---|
Filename extension | mobi prc |
Both extensions have been used. Mobipocket originally used the .prc extension. By November 2006 documentation mentioned the .mobi extension. The compilers of this resource have been unable to determine exactly when the .mobi extension was introduced. However, it seems probable that it would be in 2006, when the Mobigen tool replaced PRCgen. Comments welcome. The extension .prc is used for a number of different file formats; see PRC file extension from File-Extensions.org. |
Internet Media Type | application/x-mobipocket-ebook application/vnd.amazon.ebook |
The first value has been widely used, including by Amazon. See Amazon Macie: Content Type. The second value is occasionally mentioned, e.g. in the list CELLAR media types as used in the EU Publications Office CELLAR project, but was never officially registered with IANA. In 2016, Amazon registered application/vnd.amazon.mobi8-ebook with IANA for the version of MOBI for which it used the .azw3 extension. |
Magic numbers | ASCII: BOOKMOBI Hex: 424F4F4B4D4F4249 ASCII: TEXTREAD Hex: 5445587452454164 |
These strings are found at byte 61, according to PRONOM PUID: fmt/396. According to Palm Database Format from MobileRead Wiki, the first 32 bytes of a file are a name for the file and for ebooks are often from the book title, sometimes with author as well. |
Pronom PUID | fmt/396 |
See https://www.nationalarchives.gov.uk/PRONOM/fmt/396. |
Wikidata Title ID | Q27996279 |
See https://www.wikidata.org/wiki/Q27996279. |
General |
Early Mobipocket files could support interactivity, for example, by including Javascript. This is because the underlying PRC file could contain program code in addition to data, so that a PRC file could be a stand-alone application. However, the compilers of this description have not determined whether any of the tools supplied by Mobipocket or Amazon facilitated this. Javascript in mobi ebooks? in a 2010 Q&A thread states "Javascript support was dropped when the Java based Mobipocket Reader was developed." Mobipocket Reader on MobileRead Wiki does not indicate when the Java-based reader was introduced. Major updates to Mobipocket Reader were made in November 2005 (version 5.0) and February 2007 (version 6.0). The last version of Mobipocket Reader was version 6.2. See also RIP: Mobipocket 2000-2011. Comments welcome |
---|---|
History |
Mobipocket SA was incorporated in France in March 2000 and established a server for distributing ebooks in the Mobipocket file format, usually, but not always, protected by Digital Rights Management (DRM). The earliest Mobipocket.com home page captured by the Internet Archive was in August 2000. The company produced the Mobipocket Reader software for mobile phones, personal digital assistants (PDA) and desktop operating systems, starting with the Palm PDA and Windows CE. The binary distribution format for Mobipocket ebooks was described as the XDOC PRC format, based on the PalmOS DOC format and using the .prc file extension. Mobipocket SA was acquired by Amazon in 2005. Amazon retained the Mobipocket service until late 2011, when the Mobipocket.com site stopped publishing and distributing ebooks (see below). Meanwhile, in 2007, Amazon brought out the first Kindle and launched its Digital Text Platform (DTP) for submitting content for conversion into Kindle-compatible form. DTP accepted .mobi or .prc files; see Amazon DTP Quickstart Guide. By July 2009, Amazon had launched a service, initially called Amazon's eBook Program, renamed by August 2009 as Amazon Kindle's Publishing Program. The technology originating with Mobipocket was adopted and extended by Amazon for its Kindle reader in a sequence of formats with names beginning AZW and KF. Files with the .mobi extension appear to have been the primary means of uploading ebooks for publication until 2017. In late 2011, Amazon closed the Mobipocket.com service, informing publishers that submissions to the Mobipocket service would cease and that in January 2012, Mobipocket ebooks would be removed. See RIP: Mobipocket 2000-2011. Existing documentation for the format and some free ebooks remained online (apparently unchanged) on the Mobipocket website through 2016. See Mobipocket.com in November 2016 around time of closure. Amazon has supported reading of .mobi and .prc files without DRM protection on Kindles through early 2019. See Wikipedia entry on Amazon_Kindle for a chronology of Kindle devices. However, since 2017, the primary mechanisms for submitting content through Kindle Direct Publishing use different proprietary formats. KPF is intended as format that authors and publishers can modify using Amazon tools. It is a Zip-based format that incorporates a descriptive KCB file in JSON format and a KDF file containing the main book content in an SQLITE database. KFX books delivered by Amazon are a bundle of files composed of an encrypted main container, a metadata container, auxiliary containers (optional), and protected by DRM. See KFX Conversion Output Plugin and entry for KFX from MobileRead Wiki. |
|