Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

JATS, Journal Article Tag Suite, NISO Z39.96, Versions 1.x

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name JATS: Journal Article Tag Suite, versions 1.0 and 1.1, NISO Z39.96:2012-2015
Description

ANSI/NISO Z39.96 is a specification for Standardized Markup for Journal Articles, comonly known as JATS (Journal Article Tag Suite). JATS is based on a tag suite and DTDs developed originally by the National Center for Biotechnology Information (NCBI), a division of the National Library of Medicine (NLM). See NCBIArch_1 for information on this predecessor format, often referred to as the "NLM DTD."

Since 2009, JATS has been maintained under the auspices of NISO and given the number Z39.96; see http://www.niso.org/workrooms/journalmarkup. The JATS Tag Suite has three subsets, one aimed as an interchange format for publishers to transmit final journal content for Archiving; one intended for Publishing; and a third intended to encourage Authoring using practices that support the Publishing and Archiving stages of an article's lifecycle. See Notes below for more detail on the three variants. This description focuses on the Archiving and Interchange subset, which is the most permissive and inclusive of the three. The Tag Library for the latest version of this subset (version 1.1 as of March 2017) can be found at https://jats.nlm.nih.gov/archiving/. As stated on that page, "the intent of the Journal Archiving and Interchange Tag Set is to preserve the intellectual content of journals independent of the form in which that content was originally delivered. This Tag Set enables an archive to capture structural and semantic components of existing material without modeling any particular sequence or textual format."

An article marked up in JATS has <article> as its root element. Within this element, the following elements are permitted:

  • <front> -- a mandatory element that holds: metadata for the parent journal; metadata for the article, including details for volume, issue, pages, title, contributors, copyright, publication date, keywords, etc.
  • <body> -- A non-repeatable element to hold the main textual and graphic content of the article, usually consisting of paragraphs and sections, which may themselves contain figures, tables, sidebars (boxed text), etc., all marked up in XML. [Note: A @specific-use attribute may be applied to the <body> element to indicate the situation when the <body> does not contain the typical tagged narrative content. For example, a <body> could take a @specific-use attribute to indicate that the <body> is an untagged 'bag of words' for indexing purposes, contains undifferentiated OCR content, or is tagged as a single paragraph which is a text dump.]
  • <back> -- an optional element that might contain a list of references, glossary, or appendices
  • <floats-group> -- an optional container element for floating objects (figures, tables, text boxes, graphics, etc.) that occur within an article but outside of the narrative flow of the article. Used by some publishers and archives to hold all such floating elements that are referenced in the article body or back matter. [Note: The element <floats-group> was significantly remodeled from early versions of this Tag Set. It is backward compatible with NLM DTD version 3.0 (NCBIArch_3, but not with earlier versions.]
  • Optionally, either:
    • a sequence of <response> elements -- a <response> element holds a commentary on the article itself.
    • or a sequence of <sub-article> elements-- a <sub-article> element holds a small article completely contained within the main article.

According to the JATS FAQ, NISO JATS version 1.0 is a fully backward-compatible revision to version 3.0 of the NCBI/NLM Tag Suite and its three article models (for Archiving, Publishing, and Authoring) [see NCBIArch_3.]. Version 1.1 of the JATS Tag Suite is a fully backward-compatible revision of JATS version 1.0. See History Notes below.

The JATS DTDs are designed to hold the entire contents of a published article marked up in XML, but can also be used for recording article metadata, accompanying article content in another format, typically PDF.

Production phase The Journal Archiving and Interchange Tag Set is generally used for exchanging works in their final state. Related Publishing and Authoring DTDs (built from the same tag set, but less permissive) are intended as initial- or middle-state formats for authors and publishers.
Relationship to other formats
    Has earlier version NCBIArch_3, NCBI/NLM Journal Archiving and Interchange DTD, version 3.0
    Has earlier version NCBIArch_2, NCBI/NLM Journal Archiving and Interchange DTD, version 2.x
    Has earlier version NCBIArch_1, NCBI/NLM Journal Archiving and Interchange DTD, version 1.x
    Has modified version BITS_2, BITS (Book Interchange Tag Suite), version 2.0
    Defined via XML_DTD, XML Document Type Definition (DTD)

Local use Explanation of format description terms

LC experience or existing holdings

In January 2010, after a period of public comment, a regulation on Mandatory Deposit of Published Electronic Works Available Only Online was published in the Federal Register. eDeposit for eSerials began in late 2010, after the first demands for mandatory deposit for serials issued only in electronic form were issued in October. Some publishers are depositing journal content in JATS format; often using JATS headers for article metadata with article content in PDF.

LC preference

The information related to formats that would be accepted for eDeposit was added to the edition of Circular 7B: Best Edition of Published Copyrighted Works for the Collections of the Library of Congress published in August 2010 and still current as of February 2017. Circular 7B, issued prior to the publication of JATS, lists the NLM journal archiving DTD (predecessor to JATS) as first in order of preference and permits JATS under the second option.

The Library of Congress Recommended Formats Statement (RFS) for Textual Works - Electronic Serials, as of 2016, lists the JATS format first in order of preference. JATS and the predecessor NLM DTDs support metadata that satisfies the article-level metadata preferences expressed in RFS.


Sustainability factors Explanation of format description terms

Disclosure Openly documented and freely downloadable. All components of the Journal Archiving and Interchange Tag Suite are in the public domain.
    Documentation

The ANSI/NISO Z39.96 specification is available from http://www.niso.org/workrooms/journalmarkup. Additional, non-normative documentation, including DTDs and schemas in W3C XML Schema and RNG, is available from https://jats.nlm.nih.gov/.

Adoption

The interest of NCBI/NLM in the development of a DTD for scholarly articles was to have an XML-based article format as the basis for publishers to contribute journal content to PubMed Central (PMC). A participating journal must provide PMC the full text of articles in an XML format that conforms to an acceptable journal article DTD (Document Type Definition). According to PubMed Central: General Tagging Practice, PMC will accept content tagged in any version of the NLM DTD or NISO JATS 1.0, using the Publishing variant. Since 2005, Portico has worked with publishers and libraries to preserve e-journals through its E-Journal Preservation Service and been active in the development and maintenance of JATS. Full text articles submitted to Portico in SGML or XML are transformed into JATS, using a normalized Portico profile; both the original and the Portico-specific forms are stored in the archive. SciELO is a network operating since 1998 indexing and distributing open access journals online, initially from Brazil and now from 15 South American countries and South Africa. In April 2014, SciELO announced that it would be switching from using its own SGML DTD for Journals and Articles to an XML DTD based on JATS. Another customized version of JATS is ISO Standards Tag Set (ISOSTS), used by ISO for preparing many of its standards for publication.

Many national archives and institutional repositories include XML as a preferred format or in a list of formats for which there is a high confidence in ability to preserve and provide access when a schema is available, the character encoding is explicitly stated and an XSLT stylesheet for conversion to HTML exists. JATS satisfies these requirements. Examples: Deep Blue Preservation and Format Support Policy from the University of Michigan; National Archives of Norway; and the Florida Digital Archive.

A number of software tools exist for working with JATS, many free and open-source. The Wikipedia entry for Journal Article Tag Suite and JATSwiki | Tools list a variety of tools available for working with JATS files. Included are tools for converting to JATS from various word-processing or editing environments and tools for converting JATS files to distribution formats (e.g. HTML, EPUB, PDF). Examples include: eXtyles NLM, a commercial plugin for Microsoft Word to support preparation and export of articles in JATS format; JATSkit, a framework for use with the oXygen XML editor; custom JATS writer for Pandoc, an open source document conversion library; JATS Preview Stylesheets; and NLM's PubReader, used in PubMed Central and also available as open source at github/NCBITools/PubReader.

Guidelines for practices that will facilitate interoperability and re-use have been developed, including PubMed Central Tagging Guidelines and JATS For Reuse. Both these resources provide examples and tools for checking against the guidelines.

    Licensing and patents No licensing or patent issues. The tag sets are in the public domain.
Transparency Rates highly for transparency. Text content for articles is in XML, and hence viewable in basic editors, web browsers, etc. Elements have understandable tag-names, and document instances are in natural reading order.
Self-documentation The DTD includes a rich set of elements for metadata at the article and journal level. The <article> element is expected to include the article content and full descriptive metadata.
External dependencies None.
Technical protection considerations None.

Quality and functionality factors Explanation of format description terms

Text
Normal rendering Good support.
Integrity of document structure The logical structure of a document is an essential feature of JATS DTDs.
Integrity of layout and display The intent is to “preserve the intellectual content of journals independent of the form in which that content was originally delivered”.
Support for mathematics, formulae, etc. MathML and TeX math can be embedded. Integrity of rendering may be constrained by the capabilities of MathML and rendering tools. Various ways to represent chemical structures can be used.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension xml
For textual content files.
Magic numbers See note.  As for many XML-based formats, there is no guaranteed magic number or other internal signature to identify the format automatically. See discussion in Notes below.

Notes Explanation of format description terms

General

JATS subset variants: The Journal Article Tag Suite currently has three standard subsets to support different stages in the lifecycle of an article. For convenience, the variants have informally been assigned colors, which are used in display themes for documentation.

  • Journal Archiving and Interchange (green) is the most permissive of the Tag Sets. The Archiving and Interchange Tag Set defines elements and attributes that describe the content and metadata of journal articles, including research and non-research articles, letters, editorials, and book and product reviews. The Tag Set allows for descriptions of the full article content or just the article header metadata. It also allows for preservation of the sequence of content and generated text. See https://jats.nlm.nih.gov/archiving/. The focus of this description is on this variant.
  • The Journal Publishing Tag Set (blue) is a moderately prescriptive set, optimized for the archives who wish to regularize and control their content, rather than accept the sequence and arrangement presented to them by any particular publisher. The Journal Publishing Tag Set is also intended for use by publishers for the initial XML tagging of journal material, usually as converted from an authoring form like Microsoft Word. See https://jats.nlm.nih.gov/publishing/. In many cases, this variant will also be appropriate for archiving.
  • Article Authoring (pumpkin) is the most prescriptive of the Tag Sets. The Article Authoring Tag Set is optimized for authorship of new journal articles, where regularization and control of content is important, and where it is useful rather than harmful to have only one way to tag a structure. This Tag Set is more prescriptive than descriptive and includes many elements whose content must occur in a specified order. Because it is for use before acceptance, this tag set does not incorporate journal metadata. See https://jats.nlm.nih.gov/articleauthoring/.

The table in ANSI/NISO Z39.96, Clause 7: Tag Suite Components identifies which elements and attributes are permitted in each of the three variants.

Identifying JATS version and variant used in an article file: Although it will usually be obvious when looking at the beginning of a file conforming to this family of DTDs that it uses a particular chronological version of the tag suite and article model variant derived from the suite, there is no guaranteed magic number or other signature to identify the format automatically. If the file was generated using the DTD (rather than the W3C XML Schema), it is likely to have the following string or something similar near the beginning of the file:

  • <!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.1 20151215//EN"

According to PubMed Central: Minimum XML Requirements, PubMed Central validates submitted journal content data based on this PUBLIC ID in the DOCTYPE declaration in each source file. However, in PubMed Central: General Tagging Practice, "we strongly prefer the DOCTYPE declaration for associating DTDs and the @schemaLocation or @noNamespaceSchemaLocation attributes for W3C XML Schema association, but we will accept the <?xml-model?> processing instruction for DTD, W3C Schema, RELAX NG, and RELAX NG compact syntax," and provides rules for encoding such processing instructions.

History

On April 19, 2006, the Library of Congress and the British Library jointly announced support and advocacy for the NLM Archival DTD, the predecessor of JATS. To quote from the press release the two institutions "will work with the National Library of Medicine to ensure the open and transparent evolution of the NLM DTD standard by encouraging early adoption by an internationally recognized standards body." Version 3 of the NLM DTD, published in November 2008, was developed with formal standardization in view. The idea was to make the Tag Sets as logical, internally consistent, and complete as possible going forward.

The transition to standardization through NISO as "Standardized Markup for Journal Articles" took some months, with the newly constituted NISO Working Group starting work in late 2009. The original plan had been to submit version 3.0 as a draft standard to NISO, but comments received on version 3.0 in the interim led the group to choose to update the Tag Suite and the three article models. It was decided that the Journal Article Tag Suite under NISO should start with a new numbering scheme as well as adopting the JATS acronym more formally. After release as a Draft Standard for Trial Use in March 2011, NISO Z39.96 (JATS) version 1.0 was published in 2012. According to the Version 1.0 Change Report, changes between NLM version 3.0 and JATS version 1.0 included: (a) addition of a <contrib-id> to hold identifiers such as an ORCID iD; (b) an element <issn-l> to hold a linking ISSN (ISSN-L) that connects ISSNs for a publication available on several media; more formal integration of the two commonly used models for tables, NISO JATS (XHTML-inspired) Tables and OASIS (CALS) Tables; and many other small or more technical enhancements. These changes were all backward-compatible with version 3.0 of the NLM DTD, described at NCBIArch_3.

JATS is maintained in a continuous fashion with the expectation of scheduled formal updates to the standard. Following the initial publication of JATS version 1.0 in 2012, three Committee Drafts (1.1d1, 1.1d2, 1.1d3) were released for use prior to the formal 2015 update as JATS version 1.1. JATS 1.1 added optional elements and attributes based on (a) user requests between August 2012 and 2015; (b) support for metadata requirements associated with industry initiatives (CrossRef, NISO Open Access Data, Access and License Indicators, and Force11 Data Citation Principles); (c) improved internationalization; (d) a new element <code> specifically for snippets of program code; and (e) sub-elements for an address. Other new features were the option to embed MathML 3.0 or MathML 2.0 (but not both in the same article) and to add abstracts and keywords to figures, graphics, etc.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: Thursday, 16-Mar-2017 15:21:24 EDT