Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

Web Video Text Tracks Format (WebVTT)

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Web Video Text Tracks Format (WebVTT)

Web Video Text Tracks Format (WebVTT), is defined by the World Wide Web Consortium (W3C), an international community developing open standards for long-term growth of the web and is specified in the WebVTT: The Web Video Text Tracks Format W3C Candidate Recommendation (referenced throughout this document).

The WebVTT time-indexed file format is intended for marking up external text track resources in connection with the HTML <track> element, specifically HTML 5. The <track> tag specifies text tracks for <audio> and <video> elements. WebVTT files provide captions or subtitles for video content.

Structure of WebVTT:

According to the specification, WebVTT box model consists of three elements:

  • video viewport: rendering area for regions/cues
  • regions: subareas of video viewport, group cues together
  • cues: boxes with cue lines

WebVTT are container files with chunks of data time aligned with a video or audio resource. The file starts with a header, followed by a series of data blocks. Data blocks with a start/end time are WebVTT cues. Other data, per the HTML specification, includes subtitles, captions, descriptions, chapters, and metadata. WebVTT files can only contain data of one kind, i.e. chapter file vs metadata file. WebVTT caption/subtitle cues are rendered as overlays on top of a video viewport.

WebVTT files must consist of a WebVTT file body, consisting of the string “WEBVTT”, followed by data blocks, line terminators, and other optional characters.

Comments data blocks can be included, preceded by a blank line, starting with the word “NOTE” and ending with a blank line.

Example of WebVTT:

Kind: captions
Language: en

00:00:00.000 --> 00:00:12.000
This building was built in 1954.

00:00:13.050 --> 00:00:15.800
For it's 50th anniversary, though, it was remodeled.

Uses of WebVTT:

The main use for WebVTT files, according to the specification, is captioning or subtitling video content, but also WebVTT files can be used for time-aligned metadata for delivering paired cues, chapters for file navigation, and text video descriptions for visually understanding context.

Phil Cluff in Subtitles, Captions, WebVTT, HLS, and Those Magic Flags from January 2020, states “WebVTT isn’t just used for subtitles and captions (though those are the primary use cases), it can also be used for other forms of structured metadata that you might want to deliver alongside your content...WebVTT strikes an elegant balance between functionality, readability, and extensibility, being the only specification flexible enough to have a place to carry structured metadata. WebVTT is supported seamlessly on a comprehensive set of web players and OTT devices, which makes it great for streaming delivery.”

Production phase WebVTT files can be used across any production phase. A variety of software programs are available to aid users in creating, editing, converting, validating, and publishing WebVTT files.
Relationship to other formats
    Used by HTML_5, HyperText Markup Language 5. WebVTT files are created displaying timed text in connection with the HTML5 <track> element.
    May contain CSS, Cascading Style Sheet. Style sheets applied to an HTML page containing a <video> element can target WebVTT cues/regions. Style sheets can also be embedded in WebVTT files.
    May contain JSON, JSON Date Interchange Format. WebVTT files can consist of time-aligned metadata that can be any string and often is provided as a JSON construct.
    Subtype of SRT, SubRip Format. WebVTT was broadly based on SRT, initially called WebSRT with the same .srt extension. Later it was renamed to WebVTT and introduced with the <track> tag for HTML5.

Local use Explanation of format description terms

LC experience or existing holdings The Packard Campus uses both sidecar and embedded captions in preservation and access files via WebVTT, SRT, and SCC. See FADGI's 2022 Survey Results: The Current State of Accessibility Features for Audiovisual Collections Content in Five FADGI Institutions for more details.
LC preference The Library of Congress has not defined format preferences for caption or subtitle files.

Sustainability factors Explanation of format description terms


WebVTT is an open specification published by the World Wide Web Consortium. The WebVTT specification is based on the Draft Community Group Report of the Web Media Text Tracks Community Group and is produced by the W3C Timed Text Working Group as a Candidate Recommendation, with the intention to become a W3C Recommendation.


WebVTT: The Web Video Text Tracks Format – W3C Candidate Recommendation (April 2019)

WebVTT: The Web Video Text Tracks Format – Draft Community Group Report (February 2023)


WebVTT was made to be an extension of SRT (fdd000569) to add useful optional features that were not available in SRT, but by adding more features WebVTT may not be supported on as many players as SRT. states in the article WebVTT (Web Video Text Tracks) of May 2021 (Wayback Machine link), “The WebVTT file format is supported by most video players, streaming platforms, authoring tools, editing software, including: YouTube, Microsoft Player Framework, Vimeo, Adobe Premiere Pro, DVD Studio Pro,” to name a few. See link for full list.

Comments welcome.

    Licensing and patents

Repository are licensed by Contributors under the W3C Software and Document License. Contributions to Specifications are made under the W3C Community Contributor License Agreement (CLA).

The W3C Patent Policy has the goal of assuring that all W3C Recommendations can be implemented on a royalty-free basis.

Comments welcome.


WebVTT files are text files that are save in the Video Text Track (VTT) format, so they can be opened and edited in a plain text editor.

Comments welcome.


WebVTT supports time-based metadata tracks, used for additional information (base64 encoded images, JSON, additional text or text-based format) the developer needs to include. According to the specification, “A web app can listen for cue events, extract the text of each cue as it fires, parse the data and then use the results to make DOM changes (or perform other JavaScript or CSS tasks) synchronized with media playback."

Comments welcome.

External dependencies

None beyond availability of supporting software.

Comments welcome.

Technical protection considerations

WebVTT IANA Security Considerations: “Text track files themselves pose no immediate risk unless sensitive information is included within the data. Implementations, however, are required to follow specific rules when processing text tracks, to ensure that certain origin-based restrictions are honored. Failure to correctly implement these rules can result in information leakage, cross-site scripting attacks, and the like.”

Comments welcome.

Quality and functionality factors Explanation of format description terms

Normal rendering

Good support. WebVTT files are simple text files encoded as UTF-8.

Integrity of document structure

Good support. WebVTT files must follow a specified format described in the W3C specification that includes the WebVTT file body encoded as UTF-8.

According to Andreas Tai in Balisage Paper: WebVTT versus TTML: XML considered harmful for Web Captions? August 2013, “WebVTT does not use a formal grammar to describe the syntax but a sequence of rules written in normative prose.”

The WebVTT specification states, “As with any text-based format, it is possible to construct malicious content that might cause buffer over-runs, value overflows, and the like.” And “Implementers should take care in implementing a parser that over-long lines, field values, or encoded values do not cause security problems.”

Comments welcome.

Integrity of layout and display

While it is not essential to the function of a WebVTT file, the text can be styled and positioned to display as the creator pleases. Style can be defined directly in the text file by using the string “STYLE” after any headers but before the first cue. Style customizations include size, positioning, and fonts.

Style sheets applied to an HTML page containing a <video> element can target WebVTT cues/regions. Style sheets can also be embedded in WebVTT files.

The WebVTT specification states “WebVTT can embed CSS style sheets, which will be applied in user agents that support CSS. Under these circumstances, the privacy and security considerations of CSS apply, with the following caveats.”

Comments welcome.

Support for mathematics, formulae, etc.

Little to no information on WebVTT’s support of mathematics, chemical formulae, diagrams, etc.

Comments welcome.

Functionality beyond normal rendering

None known.

Comments welcome.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension vtt
Internet Media Type application/msword
Magic numbers EF BB BF 57 45 42 56 54 54 0A
EF BB BF 57 45 42 56 54 54 09
EF BB BF 57 45 42 56 54 54 EOF
57 45 42 56 54 54 0A
57 45 42 56 54 54 0D
57 45 42 56 54 54 20
57 45 42 56 54 54 09
57 45 42 56 54 54 EOF
WebVTT files all begin with one of the following byte sequences (where "EOF" means the end of the file). See
Pronom PUID fmt/1454
Wikidata Title ID Q3566973
File format. See

Notes Explanation of format description terms


Downloadable WebVTT sample.

WebVTT files all begin with one of the following byte sequences (where "EOF" means the end of the file). An optional UTF-8 BOM, the ASCII string "WEBVTT", and finally a space, tab, line break, or the end of the file.


Interesting - According to Andreas Tai in Balisage Paper: WebVTT versus TTML: XML considered harmful for Web Captions? August 2013, “Another difference to the TTML use case is, that WebVTT is designed to only serve as a web distribution format for subtitles. There is no ambition for it to be used as an intermediary format. And although later in the specification process documents and extensions were published to support the translation from existing US broadcast standards into WebVTT[P11], support for legacy formats had not been a requirement from the beginning."


WebVTT was initially created and released in 2010. Early drafts were written by WHATWG (Web Hypertext Application Technology Working Group) after discussions about what caption format should be supported by HTML5, choosing between TTML (fdd000568) or a new standard based on SubRip, SRT (fdd000569) format. The new chosen format was called WebSRT and shared the same .srt extension, before the name was changed to WebVTT.

November 2014, the Time Text Working Group published a First Working Draft of WebVTT, defining WebVTT. In April of 2019, they published an updated Candidate Recommendation of WebVTT.

Format specifications Explanation of format description terms

Useful references


Last Updated: 05/18/2023