Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

HyperText Markup Language (HTML) 5

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name HyperText Markup Language (HTML) 5, including all 5.x versions
Description

HyperText Markup Language (HTML) is the standard markup language for creating web pages and web applications. This format description is for HTML 5, standardized in two coordinated efforts. One is by the Web Hypertext Application Technology Working Group (WHATWG), which maintains a specification for HTML as a modularized "living standard" at https://html.spec.whatwg.org/.   The second is at the World Wide Web Consortium (W3C). A series of snapshots of well-supported modules have been compiled and published as W3C Recommendations: 5.0 (2014); 5.1 (2016); 5.2 (2017). The latest W3C Recommendation is a result of a May 2019 agreement between W3C and the WHATWG regarding the development of a single version of the HTML specification. The specifics of the agreement can be found here and the latest recommendations for HTML can be found at the https://html.spec.whatwg.org/multipage/. See Notes below for more on the relationship between W3C and WHATWG.

A key objective for the WHATWG was backwards compatibility to ensure good rendering of existing websites. Another objective was that the specification be detailed enough that implementers such as browser developers can achieve complete interoperability without reverse-engineering. With these objectives in mind, the dependence of HTML on SGML was eliminated for HTML 5 and the specification includes details on how browsers are to render pages and how parsers are to handle shortcuts such as missing end tags. For example, the specification states that "A <tr> element’s end tag may be omitted if the <tr> element is immediately followed by another <tr> element, or if there is no more content in the parent element." HTML parsers/validators will accept these omissions. The specification also incorporates two serializations, the backwards-compatible HTML serialization (documented in 8. The HTML syntax) and a stricter XML-based serialization (documented in 9. The XML syntax), sometimes referred to as XHTML5, and compatible with the earlier W3C Recommendations for XHTML. The XML-based serialization requires the use of a different Internet media type and has different rules for declarations at the beginning of the document.

HTML 5 incorporated major changes and extensions over HTML 4.01. Major motivations for the changes are described in a slide presentation by Olle Olsen of W3C in 2008, when the first working draft from W3C for HTML 5 was published. Detailed changes, such as new elements and attributes, are documented in HTML5 Differences from HTML4 (2014). Note that the HTML 5 specifications use lower case for element names. Some of the most significant extensions include:

  • Several elements were added for more explicit representation of document structure, including: <section>; <main>; <article>; <aside>; <header>; <footer>; and <figure>.
  • MathML and SVG markup can be used inside a document in the HTML serialization, using <math> and <svg> elements. Incorporation of MathML and SVG was already technically feasible in XHTML by employing different XML namespaces. For convenience, new named character references were added, including all named character references from MathML.
  • Flexible new elements were introduced for <audio> and <video> to avoid the need for format-specific plug-ins and to provide a consistent set of player functions. See Thoughts on Flash in the Wikipedia entry for HTML 5 for discussion about the Adobe plug-in that had been widely used for video. The new <canvas> element provided scripts with a rectangular area specified in pixels, to be used for rendering graphs, game graphics, or bit-mapped art on the fly.
  • Many detailed changes were introduced to provide more effective support for interactivity, for example, to support editing within web pages, with features such as drag-and-drop and spellchecking.
  • Many new HTML APIs available to developers, particularly for creating interactive applications, were introduced. Several existing APIs were extended, changed or obsoleted.
Production phase The primary use of HTML is as a final-state format for web pages made available on the Internet. Early HTML files were often created directly in a text editor, but by the time HTML 5 was the accepted standard for web pages, most pages were created in visual editors like Adobe's Dreamweaver or in enterprise-level content management systems. Many take advantage of popular "frameworks," software toolkits developed for building web pages making extensive use of CSS, Javascript, or both. Examples are Bootstrap and AngularJS. See Notes on Frameworks below.
Relationship to other formats
    Subtype of HTML_family, HTML File Format Family
    Has earlier version HTML_4_01, HyperText Markup Language (HTML) 4.01
    Has earlier version XHTML_1_1, Extensible HyperText Markup Language (XHTML) 1.1, Module-based XHTML
    May contain EPUB 3.0, EPUB, Electronic Publication, Version 3.0 (2011). ISO/IEC TS 30135:2014. EPUB 3 uses the XML syntax for HTML, i.e. the successor to XHTML.
    Used by WebVTT, Web Video Text Tracks Format (WebVTT). WebVTT files are created displaying timed text in connection with the HTML5 <track> element.
    Used by TTML1, Timed Text Markup Language Version 1 (TTML1).

The TTML1 specification states “While TTML is not expressly designed for direct (embedded) integration into an HTML or a SMIL document instance, such integration is not precluded.”

TTML may provide a "standard content format to reference from a <track> element in an HTML5 document.


Local use Explanation of format description terms

LC experience or existing holdings The Library of Congress home page archived on January 11, 2011 used XHTML 1.0 Transitional. For the new design introduced on January 12, 2011, HTML 5 was used. See also HTML_family.
LC preference See HTML_family.

Sustainability factors Explanation of format description terms

Disclosure

HTML 5, developed and published as a "living standard" under the auspices of WHATWG, is a non-proprietary format, openly developed and published, and freely implementable.

    Documentation

The most current specifications for HTML 5 are the HTML Living Standard from WHATWG. Between October 2014 and December 2017, W3C published a sequence of Recommendations, incorporating patches and enhancements from the WHATWG specification adopted to resolve bugs registered against the previous W3C HTML 5.x specification and more accurately representing implementations in browsers or other user agents. According to a 2019 collaboration between W3C and the WHATWG, W3C will no longer independently publish HTML specifications. As a result the URL for the latest W3C Recommendation [ https://www.w3.org/TR/html/ ] now resolves to the HTML Living Standard from WHATWG.

Adoption

According to W3Techs (Web Technology Surveys), in early March 2018, of websites based on HTML, 87% use HTML 5. That statistic omits the roughly 20% of all websites that use XHTML. Support in browsers for individual elements can be assessed via CanIUse or in tables at the bottom of entries for individual elements in the MDN HTML elements reference.

    Licensing and patents

No concerns. See HTML_family.

Transparency

HTML 5 files can be opened and viewed in text editors. The increased use of CSS and Javascript in HTML 5 frequently results in less transparency than in earlier HTML versions. The use of Javascript, including frameworks such as AngularJS or Ember.js, can often result in HTML files that are hard to interpret or render outside their original systems. See Notes on Frameworks below.

The transparency of image and video files intended for incorporating into the rendered display depends on the formats of those files. Note that such files are not stored within the HTML file, but referenced by URL. The URL may be absolute or relative to the HTML file.

See also HTML_family.

Self-documentation See HTML_family.
External dependencies See HTML_family.
Technical protection considerations See HTML_family.

Quality and functionality factors Explanation of format description terms

Text
Normal rendering See HTML_family.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension html
htm
Extensions typically employed with the usual HTML 5 serialization.
Internet Media Type text/html
The media type for the usual HTML 5 serialization is text/html
Magic numbers <!DOCTYPE html>
The specification for the HTML serialization of HTML 5 requires that a conforming document have a document type declaration of <!DOCTYPE html>, matched without case sensitivity.
Pronom PUID fmt/471
See http://www.nationalarchives.gov.uk/PRONOM/fmt/471.
Wikidata Title ID Q2053
See https://www.wikidata.org/wiki/Q2053
Tag Value Note
Filename extension xhtml
xht
Extensions sometimes used for documents in the XML serialization of HTML 5.
Internet Media Type application/xhtml+xml
The recommended media type for use with the XML serialization of HTML 5. See 2002 registration at IETF RFC 3236 and its 2014 update at https://www.iana.org/assignments/media-types/application/xhtml+xml.

Notes Explanation of format description terms

General

Relationship between W3C and WHATWG: The relationship between W3C and the WHATWG is described in the History section from the HTML 5 specification. This note is based largely on that section. W3C had stopped development of HTML in 1998, with the redirection of focus onto the XML-based XHTML. XHTML 1.0, as essentially equivalent to HTML 4.01, did not raise compatibility issues for browser vendors and was well adopted. However, the 2003 W3C Recommendation for XForms, positioned as the next generation of Web forms, would have required browsers to implement rendering engines that were incompatible with many existing HTML Web pages. In 2004, Apple, Mozilla, and Opera jointly announced their intent to continue working on extensions to HTML and formed the WHATWG. To quote from the official history, "The WHATWG was based on several core principles, in particular that technologies need to be backwards compatible, that specifications and implementations need to match even if this means changing the specification rather than the implementations, and that specifications need to be detailed enough that implementations can achieve complete interoperability without reverse-engineering each other."

In 2007, after a change of mind, W3C formed a working group to work with WHATWG. Until 2011, W3C's working group and WHATWG worked together under the same editor, Ian Hickson. In 2011, the groups concluded that they had different goals. The W3C wanted to reach closure on an HTML 5.0 Recommendation, while the WHATWG wanted to continue working to maintain the specification for HTML continuously and add new features. Since then, the W3C WG responsible for HTML has been adopting patches to address bugs and enhancements that already have wide support in browsers from the WHATWG and has published a sequence of W3C Recommendations.

In May 2019, W3C and WHATWG signed an agreement to collaborate on developing a single version of the HTML and DOM specifications. See Format Specifications, below.

Frameworks: Frameworks used for building web pages and websites are software toolkits that facilitate the use of CSS and Javascript. They come in different flavors. A CSS framework, also called a "front-end" framework, is a pre-prepared software framework that aims to make web design using the Cascading Style Sheets language easier and more standards-compliant. Examples are Bootstrap and Foundation. Other frameworks are focused on facilitating the use of Javascript. Javascript frameworks include AngularJS, Ember.js, and Backbone.js. A common use of Javascript frameworks is to build single-page applications (SPAs).

Many individuals and small businesses use another approach to build their websites. Businesses such as WordPress.com and Squarespace offer hosting, templates, and design tools on a single platform. They provide one-stop shopping allowing the non-technical to build and maintain websites. WordPress.com is a hosting platform based on the open-source WordPress.org project. Squarespace templates use HTML 5. WordPress templates are known as "themes"; the version of HTML declared is controlled by the theme. The compilers of this resource have not investigated differences among the features used in HTML files generated using different web platforms and frameworks. Comments welcome.

History

HTML 5, also called HTML5, was the first update since HTML 4.01, published as a W3C Recommendation in December 1999. HTML 5 builds both on experience with XHTML and on extensions built by browser vendors. Requirements included support for varying screen sizes; more reliable and interoperable support for audio and video; and features to support interactive applications.

The first working draft of HTML 5 was published in January 2008. Adoption of HTML 5 was gradual between 2008 and 2012, and then steady. HTML 5 was finally issued as a W3C Recommendation in 2014.

For a more complete discussion and chronology of versions for the HTML format, see HTML_family.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 05/18/2023