Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

ONIX for Books

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name ONIX (ONline Information eXchange) for Books
Description

The ONIX for Books file format is an XML-based standard for book and other book-related products metadata in order to provide a consistent method to share product information for a variety of stakeholders including: publishers, retailers, and supply chain partners. The ONIX for Books format is the international standard for representing product information in electronic format and is heavily utilized around the globe. While the file format is named ONIX for Books, additional products and media can be described using the file format including: audio-books, recorded video, e-books, and educational software. ONIX for Books XML files are referred to as ONIX messages and can be broken down into two different types, Product Information Format Messages or Acknowledgment Messages. Production Information Format Messages, sometimes known as original ONIX messages,are used to convey information about books and other book-related products between stakeholders and their computer systems. Acknowledgment Messages are returned to data senders from the recipients and list any encountered errors. As stated in the October 2019 specification, "An overall requirement for both message types is that they must conform to the XML standard, meaning that they must be well-formed XML. It is also a requirement that ONIX messages are valid according to the associated RNG and XSD schemas (which are equivalent)." EDItEUR provides additional documentation on the three main ways to validate ONIX files including ONIX DTD (Document Type Definition), classic XSD, and strict XSD. The DTD validation simply checks the XML tags while classic XSD checks tags and some of the data including codes from the code list. Strict XSD provides the most complete validation check as it can validate a larger selection of data and ensure a higher degree of consistency.

Per the 3.0.7 specification (released Oct 2019), an ONIX Product Information Format Message is broken down into four parts:

  • Start of the message (format and content dictated by the XML standard)
  • Message header block (non repeating)
  • Body of the message describing the product
  • End of the message

The message header contains several data elements, specifying the sender and data of the message, both of which are mandatory fields. The addressee field is optional and additional fields such as language, price type, and currency are optional as well. The body of an ONIX message consists of at least one product record. There is no limit to the number of product records in an ONIX message. Product records consist first of the XML label <Product> with an appropriate closing XML label </Product>. Product records serve as the fundamental unit of the ONIX message, which should labeled with a product identifier. An example of an ONIX message as described in the October 2019 3.0.7 specification is as follows:

<?xml version="1.0"?>

<ONIXMessage release="3.0">

<Header>

<!-- message header data elements -->

</Header>

<Product>

<!-- record reference for product 1 -->

<!-- product identifiers for product 1 -->

<!-- block 1 product description -->

<!-- block 2 marketing collateral detail -->

<!-- block 3 content detail -->

<!-- block 4 publishing detail -->

<!-- block 5 related material -->

<!-- block 6 product supply -->

</Product>

</ONIXMessage>

Users can find an example of an ONIX 3.0 Product Information Message file from EDItEUR’s site here, which includes a downloadable ZIP file with associated XML files within.

ONIX Acknowledgment Messages are similarly constructed of four component parts, with the message header block and product information enclosed with the start and end of the message. As with ONIX Product Information Messages, the start of the ONIX Acknowledgment Message's format and content is dictated by the XML standard. The main difference between the two ONIX messages is the body. In this case, the body of the Acknowledgment Message is often empty, consisting only of the <NoProduct/> empty data element. Acknowledgment Messages may also contain one or more Product records with reference number information and updated Product status information with no limit to the number of included records, much like Product Information Messages. An example of an Acknowledgment Message from the 3.0.7 specification can be seen below:

<?xml version="1.0"?>

<ONIXMessageAcknowledgement release="3.0"xmlns="http://ns.editeur.org/onix/3.0/acknowledgment/reference">

<Header>

<!-- message header data elements -->

<!-- status information for message -->

</Header>

<Product>

<!-- record reference for product 1 -->

<!-- status information for product 1 -->

</Product>

</ONIXMessageAcknowledgement>

EDItEUR also provides a separate downloadable ZIP file for sample ONIX Acknowledgment Messages.

There are several general attributes defined in ONIX for Books including datestamp, sourcename, and sourcetype. These attributes are used to carry information about the content of an associated element. Datestamp attributes enables any data element to carry the date or time when it was last updated but is also indicative of the data's accuracy. Sourcename attributes represent the name of the source for the data, such as a wholesaler sharing information that has been received from a publisher to a retailer. If this attribute is not included, the data source should be assumed to be the sender of the ONIX message. Sourcetype attributes carry a code indicating the type of source or authority of the data. The examples listed below can be found in the 3.0.7 specification.

  • datestamp <CopiesSold datestamp="20100621">6400 copies of this edition sold</CopiesSold> (Sales figure last confirmed June 2010)
  • sourcename <x313 sourcename="XYZ Livres SA">01</x313> (XYZ is source of information)
  • sourcetype <x313 sourcetype="01">01</x313> (Source of information is publisher)

Further attributes that have use for a limited selection of data elements:

  • collationkey
  • dateformat
  • language
  • release
  • textcase
  • textformat
  • textscript

There are two common schema definitions for elements in ONIX file formats known as "reference names" and "short tags". Reference names are generally represented in plan language, such as <PersonNameInverted></PersonNameInverted> while the short tag is represented by a code, such as <b037>. An expanded example can be seen below.

<Contributor>

<SequenceNumber>2</SequenceNumber>

<ContributorRole>A01</ContributorRole>

<PersonNameInverted>Badenov, Boris</PersonNameInverted>

</Contributor>

The above reference names example can now also be expressed in short tags:

<Contributor>

<b034>2</b034>

<b035>A01</b035>

<b037>Badenov, Boris</b037>

</Contributor>

Both the reference name and short tag are identical in meaning, but they cannot be mixed within the same message. Tag converters for both ONIX 2.1 and ONIX 3.0 exist to translate reference names to short tags. Short tags make ONIX files considerably smaller in size but are equally complex to process.

The ONIX for Books format uses codelists or controlled vocabularies that form part of the shared semantics of an ONIX message. The codelists contain numerical code values, with an accompanying short label or note to define a code’s meaning, that are used in ONIX messages. Codes are generally defined in English but can be translated into other languages without altering the meaning of the code itself. This ensures that the code value is understandable and unaffected by the variety of languages used by the sender and recipient of data. These code-list files are available in a variety of formats including: CSV, TXT, XSD, JSON, and XML.

Any DRM (Digital Rights Management) for a described object is declared in the <EpubTechnicalProtection> reference name. As described on page 50 of the October 2019 3.0.7 specification, "An ONIX code specifying whether a digital product has DRM or other technical protection features. Optional, and repeatable if a product has two or more kinds of protection (ie different parts of a product are protected in different ways)." ONIX 3.0 also provides users the ability to record any other usage constraints with their products with the <EpubUsageConstraint> group of elements which allows users to describe printing, copy/paste, lending, sharing, amongst other usage data is permissible. These reference names and elements are unique to ONIX for Books 3.0 files as this was a not permissible in ONIX 2.1.

While the format is titled, ONIX for Books, as mentioned above the file format can describe non-book products. This is done using the <ProductForm> element, which can be used to describe audio, text, and other product formats. Code list usage is also critical here as there are a variety of code values for product forms. EDItEUR provides additional guidance for describing e-books and digital content using ONIX. The <ProductFormDetail> element allows users to expand on product form with additional format or medium information. This element is optional and repeatable.

Production phase Can be used as initial, middle, or final-state format.
Relationship to other formats
    Defined via XML, XML (Extensible Markup Language)
    Defined via XML_SCHEMA, W3C XML Schema Language
    May have component XML_DTD, Document Type Definition

Local use Explanation of format description terms

LC experience or existing holdings As a part of its ongoing monthly eBook deliveries from publishers, the Library has inventoried hundreds of thousands of ONIX for Books files on long-term storage. While ONIX files are designated with the .xml extension, publishers providing content to the Library have changed the file extension to .onx, a non-formalized or recognized extension for ONIX XML files. The Library of Congress has also published an ONIX 2.1 to MARC 21 Mapping online.
LC preference According to the Recommended Formats Statement, ONIX is a preferred format for textual works in digital form and electronic serials as well as musical scores regarding metadata records.

Sustainability factors Explanation of format description terms

Disclosure Open standard. Developed by EDItEUR in collaboration with the Digital Issues working group of the Association of American Publishers, and others. Currently jointly maintained by EDItEUR with the Book Industry Communication (UK) and the Book Industry Study Group (US). To be useful for global interoperability, an ONIX for Books XML message should be associated with a schema specification for the elements and tags it contains. Such schema specifications (see XML_DTD and XML_XSD) must also be disclosed.
    Documentation As stated on EDItEUR’s own website, "ONIX for Books standards are maintained by the EDItEUR ONIX Support Team in consultation with ONIX national groups, and under the direction of an international steering committee. Currently, Graham Bell has primary responsibility for the standard, with assistance from EDItEUR consultant Francis Cave." The October 2019 standard can be found here.
Adoption

The ONIX standard has been widely implemented and adopted around the world. EDItEUR documents a list over a 100 organizations that have implemented a version of ONIX for Books in their business from several countries around the globe including but not limited to the US, the Netherlands, Canada, the UK, Australia, Brazil, Sweden, and Italy.

According to EDItEUR, implementations of ONIX 3.0 is steadily growing. Amazon announced in 2019 that it would no longer accept the 2.1 version of ONIX data for print materials after December 2020, fully expecting ONIX 3.0 data from all of its global domains. This will likely influence the industry, particularly in North America, as many American publishers and booksellers still currently use ONIX 2.1, which was sunset by EDItEUR in 2014.

Several software applications exist to assist in opening and editing ONIX files with various additional functionality to manipulation ONIX files for user needs. Some of these software applications include: Onixsuite, Onixedit, Book Connect, Title Manager, and Booksonix. Many of these software programs offer conversion functionality and supports the input of CSV and other Excel files (XLS, XLSB, or XLSX) to convert to ONIX 3.0 files. Compatibility with older ONIX versions such as 2.1 and conversion to ONIX 3.0 functionality is also heavily featured in many of these software applications.

Research data compiled by Nielsen in 2016 in both the US and UK has indicated that robust metadata has led to increased book sales for publishers. This has helped to make the case for standards such as ONIX to become more widely adopted by a variety of stakeholders including: publishers, distributors, and booksellers.

The ONIX for Books file format’s benefits to library systems has been documented as well. The presence of rich metadata from ONIX records can be used to bolster content and metadata in library online catalogues. OCLC has done a lot of work to create mappings from ONIX to MARC21. EDItEUR also links to OCLC’s two publications on mapping ONIX 2.1 and ONIX 3.0 to MARC21. The Library of Congress has also published a ONIX 2.1 to MARC21 mapping.

    Licensing and patents The 3.0.7 standard states, "All ONIX standards and documentation – including this document – are copyright materials, made available free of charge for general use. A full license agreement that governs their use is available on the EDItEUR website. In particular, if you use any of the ONIX for Books Product Information Format schemas (RNG, XSD or DTD) (‘the schemas’), you will be deemed to have accepted these terms and conditions…." EDItEUR’s license grants users a "non-exclusive, non-transferable, royalty-free perpetual worldwide license to use its Standards and Specifications."
Transparency ONIX for Books is based on XML, which is human-readable and designed for straightforward automatic parsing. For the contents to be understood and utilized to its full capabilities by stakeholders, a well-documented XML_DTD, XML_XSD or XML Schema, or other specification is needed. Human-comprehensible element tags are advantageous for transparency.
Self-documentation XML is widely used as a syntax for metadata, and metadata for all purposes can be embedded in XML documents with appropriate schema specifications.
External dependencies ONIX 3.0 is supported by XML_DTD, RNG, and XML_XSD schemas that are vital to ensuring ONIX message meet the standard requirements and are broadly interoperable.  
Technical protection considerations None

Quality and functionality factors Explanation of format description terms

Text
Normal rendering XML can represent all UNICODE characters, with UTF-8 being the default character encoding. The 3.0.7 ONIX specification from October 2019 highlights the importance of including specific encoding declarations for representation of non-ASCII characters. This is to be expected given the global use of the ONIX file format. Effective support for normal rendering is dependent on an appropriate DTD or schema specification. The 3.0.7 ONIX specification does not recommend the use of DTDs for validation as they are not sufficient since they do not make use of code lists. The current specification recommends RNG and XSD schema specifications.
Integrity of document structure XML is ideal for representing document structure.
Integrity of layout and display For textual content, best practice is to have the XML represent the logical document structure and use stylesheets to render the text in a form appropriate for the end user. Users have a choice of using reference names or short tags which may influence human readability.
Support for mathematics, formulae, etc. Formula and complicated mathematical representations are not applicable. EDItEUR specifies how numerical inputs should be formatted such as weights and dimensions or pricing on their site. Inputting numerical data is also dictated by specific EDItEUR codelists. Specific codelists exist for weight and dimensions metadata for example.
Functionality beyond normal rendering Not applicable.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension xml
Common practice for XML document instances is to use the .xml extension. The particular schema or DTD should be declared within the document. The .onx extension is not a recognized extension for ONIX for Books files.
Internet Media Type text/xml
application/xml
If an XML document is readable by casual users, text/xml is preferred. See RFC 3023 for further details.
Magic numbers See note.  See XML.
Wikidata Title ID Q7072761
See https://www.wikidata.org/wiki/Q7072761.
Other <ONIXMessage release="3.0">
The start of an ONIX for Books message must consist of this line of XML. "Using short tags like <ONIXmessage> (note the lower case M) instead of long (reference) tags that begin with <ONIXMessage>" See Identification and Description above for more information about short tags and reference names. Comments welcome.

Notes Explanation of format description terms

General  
History

ONIX for Books was developed by the Digital Issues working group of the Association of American Publishers, EDItEUR, and a variety of other contributors in 1999. The first ONIX (Online Information eXchange) version was released in January 2000. ONIX’s primary goal was to reduce the difficulties related to managing , distributing, and updating large quantities of metadata. The ONIX 1.0 version standard was a culmination of early work from the Book Industry Communication’s metadata specifications, the INDECS project and EDItEUR’s EPICS data dictionary. As stated in EDItEUR’s FAQ's, the standard’s initial release was heavily influenced by the World Wide Web’s Consortium release of the XML specification in 1998. Revised versions of the 1.0 released standard enhanced the metadata framework to include more standardized terminologies, controlled vocabularies, and more robust file format to promote data exchange. Revised versions include: 1.1 released in July 2000, 1.2 released in November 2000, and 1.2.1 released in April 2001.

Version 2.0 was published in July 2001, with a follow-up 2.1 version released in June 2003. A considerable number of American stakeholders still use version 2.1 even though EDItEUR has sunsetted all related revisions to 2.1.

The ONIX for Books Supply Update, released in August 2006 to meet specific demands to enable price, availability, and other supply-related data without the requirement to re-send a complete descriptive record for a particular product. As stated by EDItEUR, the Supply Update allows ONIX 2.1 Supply Detail and/or Market Representation data to be replaced without the need to send other components of an ONIX record.

Version 3.0 was released in 2009 with subsequent revisions issued beginning in October 2010. One of the most noticeable changes beginning in the 3.0 version is the specification documentation naming. "This Product Information Format Specification (the Specification) replaces, for ONIX for Books Release 3.0, the separate documents which in previous ONIX releases were given the titles Product Record Format, XML Message Specification and Overview and Data Elements. The change of title reflects the fact that the single Specification document now includes sections describing the top-level XML message structure and the message header as well as the product record itself." Version 3.1, released in January 2012, added functionality for implementation particularly in in East Asia. This added functionality includes increased flexibility of the format for use with Japanese, Chinese, and other East Asian languages, which is documented in EDItEUR’s 3.1 revisions summary. These 3.0 revision summaries are all publicly accessible on EDItEUR’s website. Previous versions are fully compatible.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 05/10/2022