Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

BITS (Book Interchange Tag Suite), version 2.0

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name BITS (Book Interchange Tag Suite), version 2.0 [based on JATS: Journal Article Tag Suite, version 1.1, NISO Z39.96:2015]
Description

The Book Interchange Tag Suite (BITS) version 2.0, published in February 2016, contains an XML model for books that is based on the Journal Article Tag Suite (JATS; ANSI/NISO Z39-96-2015) version 1.1 [see JATS_1]. The intent of BITS is to provide a common format in which publishers and archives can exchange final book content, including book parts such as chapters. The tag set is designed to support interchange, archiving, format-conversion, and publishing for scientific, technical, and medical books. Although supported by the National Library of Medicine, the book model should be usable beyond life sciences publishing, just as the JATS journal article models have proved useful in physics, social sciences, linguistics, and poetry. The tag suite supports markup for metadata and the narrative content of a book, metadata and narrative content for book components, and collection-level metadata for book sets and book series. The BITS Book Interchange DTD is a superset customization of the ANSI/NISO JATS standard with added elements and attributes for describing the textual and graphical content of books and book components as well as a package for the interchange of parts of books. The BITS specification is managed by the National Center for Biotechnology Information (NCBI), a division of the National Library of Medicine (NLM).

There are two top-level elements in the BITS Book DTD:

  • the Book element (<book>), to contain an entire document such as a textbook or a monograph. This description focuses on the markup for a whole book.
  • and the Book Part Wrapper element (<book-part-wrapper>), to contain a book part such as a "chapter" or "module" that needs to be handled as a discrete unit. The elements permitted in a <book-part-wrapper> are very similar to those permitted in a <book>.

When both the metadata and the text of a book are to be tagged in XML, a <book> may include the components listed below. None can be repeated except Collection Metadata, which is repeatable for a book that is part of more than one collection. The elements permitted within <book> must be in the order listed. Technically, all elements are optional.

  • <collection-meta> -- bibliographic metadata describing a book set or series to which this book or book part belongs.
  • <book-meta> -- bibliographic metadata for the book, for example, the title of the book, the date of publication, the publisher, a copyright statement, etc. This is not the textual front matter that appears at the beginning of a book.
  • <front-matter> -- the front matter element (<front-matter>) contains any textual front material for a book, such as a Dedication, Foreword, or Preface.
  • <book-body> -- contains the narrative of the work, the main textual and graphic content of the book. The body of a book contains book parts (<book-part>), which may be called parts, sections, chapters, modules, lessons, or whatever divisions a publisher has named. Book parts are recursive, so they may contain other book parts. For example, “Part 3” of a book could contain several chapters, each of which could have a foreword, the body of the chapter, one or more appendices, and a reference list.
  • <book-back> -- contains information that is ancillary to the main text, such as a glossary, appendix, or list of cited references. The back matter may also contain <floats-group>, a container element for all the “floating” objects such as tables, figures, and sidebars in a book. The back matter of book parts may also contain separate <floats-group> elements.

BITS is based on experience with an earlier series of NLM Book DTDs written to describe and mark up volumes for the NCBI online libraries. See NLM Literature Archive. However, the technical details are not based on those DTDs.

Production phase The Book Interchange Tag Set is intended for exchanging works in their final published state, for archiving and re-use rather then representing any particular layout as originally published online or in print.
Relationship to other formats
    Modification of JATS_1, JATS, Journal Article Tag Suite, NISO Z39.96, Versions 1.x
    Has earlier version BITS, Book Interchange Tag Suite, version 1, not described separately on this website at this time.
    Defined via XML_DTD, XML Document Type Definition (DTD)

Local use Explanation of format description terms

LC experience or existing holdings

The Library of Congress is not aware of any electronic books in JATS format that have been added to the collections to date. Note that for books, the preference is for hard-bound editions on archival paper, when available.

LC preference

As of 2016, the Library of Congress Recommended Formats Statement (RFS) for Textual Works - Digital lists the BITS format first in order of preference for books in digital form.


Sustainability factors Explanation of format description terms

Disclosure Openly documented by NCBI/NLM and freely downloadable. All components of the BITS Book Interchange Tag Suite are in the public domain.
    Documentation

The Tag Library for the most recent release of the BITS Tag Set is available at the following URI: https://jats.nlm.nih.gov/extensions/bits/tag-library/. The documentation for BITS, version 2 is at https://jats.nlm.nih.gov/extensions/bits/tag-library/2.0/.

Adoption

BITS 2.0 is a relatively new format as of early 2017 and the compilers of this resource are unable to predict how rapidly or widely it will be adopted. Comments welcome.

The interest of NCBI/NLM in the development of a DTD for books was to have an XML-based format as the basis for publishers to contribute book content to its digital archive and be made available through Bookshelf, the platform used by the National Library of Medicine to make freely accessible books and documents in life science and healthcare. A participating publisher is expected to provide the full text of books in an XML format that conforms to an acceptable journal article DTD (Document Type Definition). This includes BITS and an earlier series of NLM Book DTDs. NCBI BookShelf Tagging Guidelines provides guidelines for the use of BITS, version 2.0.

Portico accepts books formatted in BITS and the earlier series of NLM Book DTD specifications into its preservation service. In 2016, Portico received 322 files in BITS 1.0 format from two publishers. In the same period, ten publishers submitted over 360,000 files in three chronological versions of the NLM Book DTD.

The Publications Office for the EU has selected BITS (Book Interchange Tag Suite) as the basis for an XML mark-up model suitable for the production of its general publications.

Many national archives and institutional repositories include XML as a preferred format or in a list of formats for which there is a high confidence in ability to preserve and provide access when a schema is available, the character encoding is explicitly stated and an XSLT stylesheet for conversion to HTML exists. BITS satisfies these requirements. Examples: Deep Blue Preservation and Format Support Policy from the University of Michigan; National Archives of Norway; and the Florida Digital Archive.

    Licensing and patents No licensing or patent issues. The tag sets are in the public domain.
Transparency Rates highly for transparency. Text content for articles is in XML, and hence viewable in basic editors, web browsers, etc. Elements have understandable tag-names, and document instances are in natural reading order.
Self-documentation The DTD includes a rich set of elements for metadata at the article and journal level. The <article> element is expected to include the article content and full descriptive metadata.
External dependencies None.
Technical protection considerations None.

Quality and functionality factors Explanation of format description terms

Text
Normal rendering Excellent support.
Integrity of document structure Documenting the logical structure of a book, including its relation to a book series, is an essential feature of BITS DTDs.
Integrity of layout and display As stated in the introduction to the tag library, "As was true for JATS, the intent of BITS is to support marking up the content of material so that it can be reused, repurposed, and made more discoverable. This purpose implies, as it does in JATS, that the ability to reproduce a particular book format is not a goal."
Support for mathematics, formulae, etc. MathML and TeX math can be embedded. Integrity of rendering may be constrained by the capabilities of MathML and rendering tools. Various ways to represent chemical structures can be used.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension xml
For textual content files.
Magic numbers See note.  As for many XML-based formats, there is no guaranteed magic number or other internal signature to identify the format automatically. See discussion in Notes below.

Notes Explanation of format description terms

General

Differences from other book markup schemes: As described in The Roads Not Taken in the general introduction to the BITS 2.0 Tag Library, there are some structures which have been modeled as elements in other public book DTDS and schemas that are not included explicitly in the BITS tag suite or are structured differently. Examples selected for explanation are:

  • Book Metadata — The book metadata is held in the element <book-meta> and not within <front>, although some of the metadata elements are usually displayed in the front of the book along with other front matter elements, such as a foreword or table of contents.
  • CCC Statement -- There is no element specifically for a Copyright Clearance Center statement. Such a statement may be tagged as a license paragraph (<license-p>) inside a license statement (<license>), which may include the price tagged as a <price>. The @license-type attribute may be set to "CCC-statement."
  • Colophon -- No special structure exists. A colophon may be tagged as a paragraph, as a section within the body of the book, as a book part in the back matter of a book, or as a book part.
  • Contributor List -- There is no special structure for such lists; they can be tagged using ordinary structures (lists, definition lists, tables, paragraphs, etc.) within a <front-matter-part>, within the narrative front matter, or within the back matter. Such lists should be in addition to the contributor names listed in <contrib> in the metadata of a book or book part.
  • Frontispiece -- No special structure exists; the material should be tagged using ordinary structures within a <front-matter-part> as part of the narrative front matter. While the <styled-content> element "could" be used to capture the special formatting typical in a Frontispiece, this is discouraged. The typical purpose for tagging a Frontispiece in BITS is to make the information content discoverable, not to replicate the look and feel of the document.
  • Introduction -- There is no explicitly named "Introduction" element, because this name may be applied to a front-matter component, a section of the body, or an entire book part.

Identifying BITS version and variant used in an article file: Although it will usually be obvious when looking at the beginning of a file conforming to this family of DTDs that it uses a particular chronological version of the tag suite and article model variant derived from the suite, there is no guaranteed magic number or other signature to identify the format automatically. If the file was generated using the DTD (rather than the W3C XML Schema), it is likely to have the following string or something similar near the beginning of the file:

  • <!DOCTYPE article PUBLIC "-//NLM//DTD BITS Book Interchange DTD v2.0 20151225//EN"
History

The BITS Book Tag Set is not based directly on the NCBI/NLM Book (Bookshelf) Tag Set that was part of the NCBI/NLM family of DTDs that preceded JATS. Book Interchange Tag Suite (BITS), version 1.0 was published in December 2013. Book Interchange Tag Suite (BITS), version 2.0 was published in February 2016.

According to BITS 2.0 and JATS 1.1 Changes, BITS went to a new version number with 2.0 because not all the changes from BITS 1.0 were backward compatible. Changes were made to Ruby Markup (inherited from changes to JATS between BITS versions); to question and answer markup based on user feedback about the inadequacies of the original BITS models; and to Index and Table of Contents structures to make future modifications less disruptive.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: Thursday, 16-Mar-2017 14:20:54 EDT