Sustainability of Digital Formats: Planning for Library of Congress Collections |
|
Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact |
Full name | JATS: Journal Article Tag Suite, versions 1.0 and 1.1, NISO Z39.96:2012-2015 |
---|---|
Description |
ANSI/NISO Z39.96 is a specification for Standardized Markup for Journal Articles, commonly known as JATS (Journal Article Tag Suite). JATS is based on a tag suite and DTDs developed originally by the National Center for Biotechnology Information (NCBI), a division of the National Library of Medicine (NLM). See NCBIArch_1 for information on this predecessor format, often referred to as the "NLM DTD." Since 2009, JATS has been maintained under the auspices of NISO and given the number Z39.96; see http://www.niso.org/workrooms/journalmarkup. The JATS Tag Suite has three subsets, one aimed as an interchange format for publishers to transmit final journal content for Archiving; one intended for Publishing; and a third intended to encourage Authoring using practices that support the Publishing and Archiving stages of an article's lifecycle. See Notes below for more detail on the three variants. This description focuses on the Archiving and Interchange subset, which is the most permissive and inclusive of the three. The Tag Library for the latest version of this subset (version 1.1 as of March 2017) can be found at https://jats.nlm.nih.gov/archiving/. As stated on that page, "the intent of the Journal Archiving and Interchange Tag Set is to preserve the intellectual content of journals independent of the form in which that content was originally delivered. This Tag Set enables an archive to capture structural and semantic components of existing material without modeling any particular sequence or textual format." An article marked up in JATS has <article> as its root element. Within this element, the following elements are permitted:
According to the JATS FAQ, NISO JATS version 1.0 is a fully backward-compatible revision to version 3.0 of the NCBI/NLM Tag Suite and its three article models (for Archiving, Publishing, and Authoring) [see NCBIArch_3.]. Version 1.1 of the JATS Tag Suite is a fully backward-compatible revision of JATS version 1.0. See History Notes below. The JATS DTDs are designed to hold the entire contents of a published article marked up in XML, but can also be used for recording article metadata, accompanying article content in another format, typically PDF. |
Production phase | The Journal Archiving and Interchange Tag Set is generally used for exchanging works in their final state. Related Publishing and Authoring DTDs (built from the same tag set, but less permissive) are intended as initial- or middle-state formats for authors and publishers. |
Relationship to other formats | |
Has earlier version | NCBIArch_3, NCBI/NLM Journal Archiving and Interchange DTD, version 3.0 |
Has earlier version | NCBIArch_2, NCBI/NLM Journal Archiving and Interchange DTD, version 2.x |
Has earlier version | NCBIArch_1, NCBI/NLM Journal Archiving and Interchange DTD, version 1.x |
Has later version | JATS 1.2. Not described separately at this time. See https://jats.nlm.nih.gov/publishing/1.2/dtd.html for details. |
Has later version | JATS 1.3. Not described separately at this time. See https://jats.nlm.nih.gov/publishing/1.3/ |
Has modified version | BITS_2, BITS (Book Interchange Tag Suite), version 2.0 |
Defined via | XML_DTD, XML Document Type Definition (DTD) |
LC experience or existing holdings |
In January 2010, after a period of public comment, a regulation on Mandatory Deposit of Published Electronic Works Available Only Online was published in the Federal Register. eDeposit for eSerials began in late 2010, after the first demands for mandatory deposit for serials issued only in electronic form were issued in October. Some publishers are depositing journal content in JATS format; often using JATS headers for article metadata with article content in PDF. |
---|---|
LC preference |
The information related to formats that would be accepted for eDeposit was added to the edition of Circular 7B: Best Edition of Published Copyrighted Works for the Collections of the Library of Congress published in August 2010 and still current as of February 2017. Circular 7B, issued prior to the publication of JATS, lists the NLM journal archiving DTD (predecessor to JATS) as first in order of preference and permits JATS under the second option. The Library of Congress Recommended Formats Statement (RFS) for Section iii. Textual Works - Electronic Serials, lists the JATS format first in order of preference. JATS and the predecessor NLM DTDs support metadata that satisfies the article-level metadata preferences expressed in RFS. |
Disclosure | Openly documented and freely downloadable. All components of the Journal Archiving and Interchange Tag Suite are in the public domain. |
---|---|
Documentation |
The ANSI/NISO Z39.96 specification is available from http://www.niso.org/workrooms/journalmarkup. Additional, non-normative documentation, including DTDs and schemas in W3C XML Schema and RNG, is available from https://jats.nlm.nih.gov/. |
Adoption |
The interest of NCBI/NLM in the development of a DTD for scholarly articles was to have an XML-based article format as the basis for publishers to contribute journal content to PubMed Central (PMC). A participating journal must provide PMC the full text of articles in an XML format that conforms to an acceptable journal article DTD (Document Type Definition). According to PubMed Central: General Tagging Practice, PMC will accept content tagged in any version of the NLM DTD or NISO JATS 1.0, using the Publishing variant. Since 2005, Portico has worked with publishers and libraries to preserve e-journals through its E-Journal Preservation Service and been active in the development and maintenance of JATS. Full text articles submitted to Portico in SGML or XML are transformed into JATS, using a normalized Portico profile; both the original and the Portico-specific forms are stored in the archive. SciELO is a network operating since 1998 indexing and distributing open access journals online, initially from Brazil and now from 15 South American countries and South Africa. In April 2014, SciELO announced that it would be switching from using its own SGML DTD for Journals and Articles to an XML DTD based on JATS. Another customized version of JATS is ISO Standards Tag Set (ISOSTS), used by ISO for preparing many of its standards for publication. Many national archives and institutional repositories include XML as a preferred format or in a list of formats for which there is a high confidence in ability to preserve and provide access when a schema is available, the character encoding is explicitly stated and an XSLT stylesheet for conversion to HTML exists. JATS satisfies these requirements. Examples: Deep Blue Preservation and Format Support Policy from the University of Michigan; National Archives of Norway; (Link via Internet Archive) and the Florida Digital Archive. A number of software tools exist for working with JATS, many free and open-source. The Wikipedia entry for Journal Article Tag Suite and JATSwiki | Tools (Link via Internet Archive) list a variety of tools available for working with JATS files. Included are tools for converting to JATS from various word-processing or editing environments and tools for converting JATS files to distribution formats (e.g. HTML, EPUB, PDF). Examples include: eXtyles NLM, a commercial plugin for Microsoft Word to support preparation and export of articles in JATS format; JATSkit, a framework for use with the oXygen XML editor; custom JATS writer for Pandoc, an open source document conversion library; JATS Preview Stylesheets; and NLM's PubReader, used in PubMed Central and also available as open source at github/NCBITools/PubReader. Guidelines for practices that will facilitate interoperability and re-use have been developed, including PubMed Central Tagging Guidelines and JATS For Reuse. Both these resources provide examples and tools for checking against the guidelines. |
Licensing and patents | No licensing or patent issues. The tag sets are in the public domain. |
Transparency | Rates highly for transparency. Text content for articles is in XML, and hence viewable in basic editors, web browsers, etc. Elements have understandable tag-names, and document instances are in natural reading order. |
Self-documentation |
The DTD includes a rich set of elements for metadata at the article and journal level. The <article> element is expected to include the article content and full descriptive metadata. Accessibility Features According to the BITS Book Interchange Tag Suite, version 2.1 documentation, the JATS format has moderate support for accessibility features. The JATS tag set, like BITS, provides two elements that can be applied to graphics, images, tables, and figures to provide content descriptions:
JATS also includes the following accessibility options:
|
External dependencies | None. |
Technical protection considerations | None. |
Text | |
---|---|
Normal rendering | Good support. |
Integrity of document structure | The logical structure of a document is an essential feature of JATS DTDs. |
Integrity of layout and display | The intent is to “preserve the intellectual content of journals independent of the form in which that content was originally delivered”. |
Support for mathematics, formulae, etc. | MathML and TeX math can be embedded. Integrity of rendering may be constrained by the capabilities of MathML and rendering tools. Various ways to represent chemical structures can be used. |
Tag | Value | Note |
---|---|---|
Filename extension | xml |
For textual content files. |
Magic numbers | See note. | As for many XML-based formats, there is no guaranteed magic number or other internal signature to identify the format automatically. See discussion in Notes below. |
Pronom PUID | See note. | No PRONOM PUID as of May 2024 |
Wikidata Title ID | Q17060731 |
See https://www.wikidata.org/wiki/Q17060731. |
General |
JATS subset variants: The Journal Article Tag Suite currently has three standard subsets to support different stages in the lifecycle of an article. For convenience, the variants have informally been assigned colors, which are used in display themes for documentation.
The table in ANSI/NISO Z39.96, Clause 7: Tag Suite Components identifies which elements and attributes are permitted in each of the three variants. Identifying JATS version and variant used in an article file: Although it will usually be obvious when looking at the beginning of a file conforming to this family of DTDs that it uses a particular chronological version of the tag suite and article model variant derived from the suite, there is no guaranteed magic number or other signature to identify the format automatically. If the file was generated using the DTD (rather than the W3C XML Schema), it is likely to have the following string or something similar near the beginning of the file:
According to PubMed Central: Minimum XML Requirements (Link via Internet Archive), PubMed Central validates submitted journal content data based on this PUBLIC ID in the DOCTYPE declaration in each source file. However, in PubMed Central: General Tagging Practice, "we strongly prefer the DOCTYPE declaration for associating DTDs and the @schemaLocation or @noNamespaceSchemaLocation attributes for W3C XML Schema association, but we will accept the <?xml-model?> processing instruction for DTD, W3C Schema, RELAX NG, and RELAX NG compact syntax," and provides rules for encoding such processing instructions. |
---|---|
History |
On April 19, 2006, the Library of Congress and the British Library jointly announced support and advocacy for the NLM Archival DTD, the predecessor of JATS. To quote from the press release the two institutions "will work with the National Library of Medicine to ensure the open and transparent evolution of the NLM DTD standard by encouraging early adoption by an internationally recognized standards body." Version 3 of the NLM DTD, published in November 2008, was developed with formal standardization in view. The idea was to make the Tag Sets as logical, internally consistent, and complete as possible going forward. The transition to standardization through NISO as "Standardized Markup for Journal Articles" took some months, with the newly constituted NISO Working Group starting work in late 2009. The original plan had been to submit version 3.0 as a draft standard to NISO, but comments received on version 3.0 in the interim led the group to choose to update the Tag Suite and the three article models. It was decided that the Journal Article Tag Suite under NISO should start with a new numbering scheme as well as adopting the JATS acronym more formally. After release as a Draft Standard for Trial Use in March 2011, NISO Z39.96 (JATS) version 1.0 was published in 2012. According to the Version 1.0 Change Report, changes between NLM version 3.0 and JATS version 1.0 included: (a) addition of a <contrib-id> to hold identifiers such as an ORCID iD; (b) an element <issn-l> to hold a linking ISSN (ISSN-L) that connects ISSNs for a publication available on several media; more formal integration of the two commonly used models for tables, NISO JATS (XHTML-inspired) Tables and OASIS (CALS) Tables; and many other small or more technical enhancements. These changes were all backward-compatible with version 3.0 of the NLM DTD, described at NCBIArch_3. JATS is maintained in a continuous fashion with the expectation of scheduled formal updates to the standard. Following the initial publication of JATS version 1.0 in 2012, three Committee Drafts (1.1d1, 1.1d2, 1.1d3) were released for use prior to the formal 2015 update as JATS version 1.1. JATS 1.1 added optional elements and attributes based on (a) user requests between August 2012 and 2015; (b) support for metadata requirements associated with industry initiatives (CrossRef, NISO Open Access Data, Access and License Indicators, and Force11 Data Citation Principles); (c) improved internationalization; (d) a new element <code> specifically for snippets of program code; and (e) sub-elements for an address. Other new features were the option to embed MathML 3.0 or MathML 2.0 (but not both in the same article) and to add abstracts and keywords to figures, graphics, etc. |
|