Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

Extensible Binary Meta Language

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Extensible Binary Meta Language

Extensible Binary Meta Language (EBML) format is a container format designed as a binary equivalent of XML. While EBML is agnostic to the types of data it can contain, it is perhaps best known as the basis for the Matroska Multimedia Container which uses a specific profile of EBML for the carriage of video, audio, subtitles, captions and other types of embedded data. This description focuses on EBML for the Matroska implementation. History.

The Internet Engineering Task Force (IETF) published the Proposed Standard RFC 8794 in July 2020. Full version history on IETF Datatracker. For more details on the history of EBML, see Notes.

An EBML Document is comprised of only two components, an EBML Header and an EBML Body. The semantic building blocks of EBML documents are 'elements' made of three pieces of data (tag, length, and value) making them easy to parse and also allowing for selective data parsing. The EBML structure additionally allows for hierarchical arrangement to support complex structural formats in an efficient manner.

The required EBML Header provides context to interpret and process the structure and meaning of the EBML Document as a whole. The components within the header include the EBML DocType (also known as the EBML Schema) which defines the EBML Document in the same way that schemas for XML define an XML document. See the CELLAR Working Group EBMLSchema.xsd and EBML Specification repository on GitHub. Unlike XML, an EBML Document requires an EBML Schema to be interpreted semantically. The DocType identifies the content of the EBML Body which follows so in the case of the Matroska EBML implementation, this would be EBMLSchema docType="matroska" and for a WebM file, it would be EBMLSchema docType="webm". The EBML Header also contains version information for both the EBML parser and the EBML DocType interpreter used to create the file. Unlike XML schemas, an EBML Schema documents all versions of a DocType's definition rather than using separate EBML Schemas for each version of a docType. See EBML Schema Example. An EBML Schema must declare one EBML Element at Root Level (referred to as the Root Element but known as the Segment in Matroska) that occurs exactly once within an EBML Document. The Void Element MAY also occur at Root Level but is not a Root Element. According to the IETF specification version 17 from July 2020, "the EBML Header MUST contain a single Master Element with an Element Name of EBML and Element ID of 0x1A45DFA3 (see Section 11.2.1) and any number of additional EBML Elements within it. The EBML Header of an EBML Document that uses an EBMLVersion of 1 MUST only contain EBML Elements that are defined as part of this document."

The EBML Body that follows the EBML Header contains a series of EBML Elements which serve as the building blocks for the content. Each EBML Element combines three parts: an Element ID which uniquely identifies a particular Element in the file, an Element Data Size which the records the length in octets of Element Data (although the data size may be unknown), and finally, Element Data.  Both the Element ID and Element Data Size are variable size integers while the Element Data includes either binary data, one or more other EBML Elements, or both. The end of the EBML Body, as well as the end of the EBML Document that contains the EBML Body, is reached at whichever comes first: the beginning of a new EBML Header at the Root Level or the end of the file.

Each EBML Element must declare its EBML Element Type from one of the following options: Signed Integer Element, Unsigned Integer Element, Float Element, String Element, UTF-8 Element, Date Element, Master Element and Binary Element. The EBML Element Type defines a concept of storing data within an EBML Element that describes such characteristics as length, endianness, and definition.

There are two Global Elements with unique characteristics that can be found in any EBML Document: the CRC-32 Element and the Void Element. These special optional Elements can be found within more than one parent in an EBML Document or optionally at the Root Level of an EBML Body. The CRC-32 Element contains a 32-bit Cyclic Redundancy Check value of all the Element Data of the Parent Element, excepting the CRC-32 Element itself. The Void element has two roles: 1) to void data or to avoid unexpected behaviors when using damaged data and the content is is discarded, and 2) to reserve space in a sub-element for later use.

An interesting feature of EBML is the ability to update the Element Data without requiring that the entire EBML Document be rewritten and thus causing minimal disruption to the rest of the EBML Document. This can result in time and resource savings as Matroska multimedia files can be large and complex.

Production phase Can be used as initial, middle, or final-state format
Relationship to other formats
    Used by Matroska, Matroska Multimedia Container
    Defined via XML, Extensible Markup Language (XML)

Local use Explanation of format description terms

LC experience or existing holdings See Matroska Multimedia Container
LC preference See Matroska Multimedia Container

Sustainability factors Explanation of format description terms


Fully documented through the public IETF DataTracker. Includes all versions of the specification starting from the earliest version from September 23, 2016 as well as history and reviewer feedback. Published as Proposed Standard RFC 8794 in July 2020.

The EBML specification is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)

    Documentation Published as Proposed Standard RFC 8794 in July 2020.
Adoption While EBML is a generalized format for any type of data and the RFC 8794 specification is intended to define how other EBML-based formats can be defined, implementation is limited to Matroska Multimedia Container and WebM. Comments welcome.
    Licensing and patents Depends on implementation. See Matroska Multimedia Container and WebM
Transparency Depends upon included encodings, some of which will depend upon algorithms and tools to read and require sophistication to build tools.
Self-documentation See Matroska Multimedia Container and WebM
External dependencies None
Technical protection considerations Aside from the CRC-32 Element, EBML does not offer options for data integrity monitoring, encryption or other security options. According to the specification describing requirements for EBML readers, "If a Master Element contains a CRC-32 Element that doesn't validate, then the EBML Reader MAY ignore all contained data except for Descendant Elements that contain their own valid CRC-32 Element."

Quality and functionality factors Explanation of format description terms

Moving Image
Normal rendering See Matroska Multimedia Container
Clarity (high image resolution) See Matroska Multimedia Container
Functionality beyond normal rendering See Matroska Multimedia Container
Normal rendering See Matroska Multimedia Container
Fidelity (high audio resolution) See Matroska Multimedia Container
Multiple channels See Matroska Multimedia Container
Support for user-defined sounds, samples, and patches See Matroska Multimedia Container
Functionality beyond normal rendering See Matroska Multimedia Container

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Magic numbers HEX: 0x1A45DFA3
From EBML specification, section 11.2.1. Used for all EBML-based files including Matroska Multimedia Container and WebM
Other See note.  EBMLSchema docType in the header defines the content of the body. Matroska files will have the "matroska" value and WebM files will have "webm" value.
Indicator for profile, level, version, etc. See note.  An EBML Document handles 2 different versions: the version of the EBML Header (defined in EBMLVersion) and the version of the EBML Body (defined in EBMLDocTypeVersion). Both versions are meant to be backward compatible. An EBML parser can read an EBML Header if it can read either the EBMLVersion version or a version equal or higher than the one found in EBMLReadVersion.
Pronom PUID See note.  PRONOM has no corresponding entry as of July 2020
Wikidata Title ID Q1273936

Notes Explanation of format description terms


The standardization of EBML has its roots in the European PREFORMA (PREservation FORMAts for culture information/e-archives) project which had the stated intention "to research critical factors in the quality of standard implementation in order to establish a long-term sustainable ecosystem around developed tools with a variety of stakeholder groups." PREFORMA started in 2014 and co-funded by the European Commission under its Seventh Framework Programme (link through EU Web archive) which was active from 2007 to 2013. Among the projects funded through this data call was the CELLAR (Codec Encoding for LossLess Archiving and Realtime) working group organized through IETF whose charter lists these goals:

  • "FFV1 is a lossless video codec and Matroska is an extensible media container based on EBML (Extensible Binary Meta Language), a binary XML format. There are open source implementations of both formats, and an increasing interest in and support for use of FFV1 and Matroska. However, there are concerns about the sustainability and credibility of existing specifications for the long-term use of these formats. These existing specifications require broader review and formalization in order to encourage widespread adoption....Using existing work done by the development communities of Matroska, FFV1, and FLAC, the Working Group will formalize specifications for these open and lossless formats. In order to provide authoritative, standardized specifications for users and developers, the Working Group will seek consensus throughout the process of refining and formalizing these standards."

The formalization of EBML was needed in order to firm up Matroska to accomplish this, so sections of the Matroska specification which more directly pertained to EBML were moved into the EBML specification "so that the Matroska specification may build upon the EBML specification rather than act redundantly to it. The updated EBML specification includes documentation on how to define an EBML Schema which is a set of Elements with their definitions and structural requirements rendered in XML form. Matroska’s documentation now defines Matroska through an EBML Schema as a type of EBML expression." According to Ashley Blewer and Dave Rice in their 2016 iPres paper, "In 2004 (two years after the origin of Matroska), Martin Nilsson produced an RFC draft of EBML, which extensively documented the format in Augmented Backus-Naur Form (ABNF). This draft was not published by the IETF but remained on the Matroska site as supporting documentation. Also in 2004, Dean Scarff provided draft documentation for a concept of the EBML Schema."

The PREFORMA project also funded the development of MediaConch (Media CONformance CHecker) open source software. Developed by MediaArea, MediaConch is an implementation checker, policy checker, reporter, and fixer that targets preservation-level audiovisual files specifically for Matroska, Linear Pulse Code Modulation (LPCM) and FF Video Codec 1 (FFV1).

The first draft version (version 00) was published on September 23, 2016. The specification achieved RFC status and was published as RFC 8794 on July 22, 2020. Full version history on IETF Datatracker.

Format specifications Explanation of format description terms

Useful references


Last Updated: 09/21/2020