Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

Unified Speech and Audio Coding

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Unified Speech and Audio Coding
Description

Unified Speech and Audio Coding (USAC) is a bitstream encoding format for audio. According to the Audio Engineering Society (AES), the intention behind this codec was to combine the benefits of general audio coding and speech coding into a unified system. The ISO/IEC JTC1/SC29/WG11 Moving Picture Experts Group (MPEG) finalized the new MPEG-D Unified Speech and Audio Coding standard in early 2012. The format was published as an international standard ISO/IEC 23003-3 (also referred to as MPEG-D Part 3) and also as an MPEG-4 Audio Object Type in ISO/IEC 14496-3:2009/Amd 3 in 2012. These standards have subsequently been updated, and the latest as of February 2024 are ISO/IEC 23003-3:2020 and ISO/IEC 14496-3:2019.

According to the Moving Picture Experts Group, USAC incorporates techniques developed in previous MPEG standards: perceptually shaped quantization noise, parametric coding of the upper spectrum region and parametric coding of the stereo sound stage. USAC combines these techniques with a source coding technique: a human speech model of sound production. USAC relies on MDCT (modified discrete cosine transform)-based transform coding techniques known from MPEG-4 audio, speech coder elements like ACELP (Algebraic code-excited linear prediction), parametric coding tools such as MPEG-4 spectral band replication (SBR) and MPEG-D MPEG surround.

USAC files are encoded binary. The bitstream syntax is based on ISO/IEC 14496-3:2019, Section 4.4.

The fundamental features are:

  • A foundation built on the AAC codec.
  • Transform Coded Excitation, which uses short-term linear prediction to model how the human vocal tract shapes the speech spectrum.
  • ACELP coding tools including short- and long-term prediction filters.
  • Arithmetic Coding of Spectral Coefficients. While MPEG-4 AAC uses Huffman coding; USAC uses adaptive, context-dependent arithmetic coding.
  • Noise Filling. Coefficients that are quantized to zero are "filled in" by random noise with a mean value equal to the mean quantization error in that scale factor band.
  • Joint Stereo Coding 1, defined as the "sum/difference or mid/side (M/S) coding of channel pairs (left and right channel signals)." This can also be found in AAC.
  • Joint Stereo Coding 2, defined as "complex stereo prediction, which is a tool for efficient coding of channel pairs with level and/or phase differences between the channels."
  • eSBR: "enhanced spectral band replication"
  • MPS212 is an MPEG Surround 2-1-2 processing mode (mono-to-stereo synthesis) that uses the one-to-two channel up-mix module.

According to the Moving Picture Experts Group, USAC has application in any area in which low-bit-rate transmission or storage is necessary and audio content is an arbitrary mix of speech, speech plus music and music. USAC was designed to compress arbitrary content composed of speech, music or a mix of speech and music. It is designed for high and low quality; it emphasizes coding in the intermediate to low range, from 32 kb/s for stereo to 12 kb/s for mono signals and improves compression performance for high-quality audio (64 kb/s and beyond).

USAC is built on MPEG-4 AAC 3,12 and High Efficiency AAC (HE-AAC) codecs but is not backwards compatible with either format. The AAC family of codec technologies is specified in ISO/IEC 13818-7 as AAC LC Profile, in ISO/IEC 14496-3 as AAC, High Efficiency AAC and High Efficiency AAC v2 Profiles, and in ISO/IEC 23003-3 as Extended High Efficiency AAC Profile. MPEG-4 AAC uses Huffman coding while USAC uses adaptive, context-dependent arithmetic coding.

USAC extends the HE-AACv2 range of use towards lower bitrates. As it additionally delivers at least the same quality as HE-AACv2 at higher rates, it also allows for applications requiring scalability over a large bitrate range.

The output of the USAC decoder can be further processed by MPEG Surround (MPS) (ISO/IEC 23003-1) or spatial audio object coding (SAOC) (ISO/IEC 23003-2).

USAC has the concept of "profiles," outlining three in the specification. Each profile has subsequent sub-profiles. The MPEG-D USAC standard (ISO/IEC 23003-3) defines three USAC profiles in Section 4.5: 1) MPEG-4 HE AACv2 compatibility, 2) Baseline USAC Profile, and 3) Extended HE-AAC Profile (xHE-AAC).

The MPEG-4 HE AACv2 compatibility profile retain all functionality and performance features of the AAC family. The specification outlines how these features are mapped between the formats.

The baseline USAC profile does not include time warped filterbank; DFT based harmonic transposer in enhanced spectral band replication; or fractional delay decorrelator in MPEG Surround for mono to stereo upmixing (MPS212). The baseline profile has 4 levels denoting max channels, max sampling rate, max PCU, and max RCU. The number of pre-roll frames, numPreRollFrames, in an AudioPreRoll() extension payload shall not exceed 3. Decoders conforming to the baseline USAC profile shall support the full decoding and correct handling of the AudioPreRoll() extension. There are further restrictions on sampling rates for the baseline profile. Details are in ISO/IEC 23003-3:2020, 4.5.3 Baseline USAC profile.

The Extended high efficiency AAC profile" contains the audio object types 42 (USAC), 5 (SBR), 29 (PS) and 2 (AAC LC) as defined in ISO/IEC 14496-3. The extended HE AAC profile is compatible with the MPEG-4 high efficiency AAC v2 profile as defined in ISO/IEC 14496-3. There are seven levels for this profile.

The output of the USAC decoder can be further processed by MPEG-D DRC (ISO/IEC 23003-4).

Relationship to other formats
    Affinity to AAC_MP4, Advanced Audio Coding (MPEG-4). As outlined in the specification, large parts of the USAC codec inherit from the MPEG-4 HE AAC v2 profile. USAC retains all functionalities and performance features available in the AAC format family (AAC, HE AAC, HE AAC v2) but does not adopt all tools. A more detailed mapping compatibility and differences can be seen in ISO/IEC 23003-3:2020 Section 4.5.2 MPEG-4 HE AACv2 compatibility: Table 1.

Local use Explanation of format description terms

LC experience or existing holdings None
LC preference See the Library of Congress Recommended Formats Statement for format preferences for audio works.

Sustainability factors Explanation of format description terms

Disclosure Fully disclosed through ISO/IEC standardization (through a paywall). Authored by the Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
    Documentation The format was published as an international standard ISO/IEC 23003-3 (also referred to as MPEG-D Part 3) and also as an MPEG-4 Audio Object Type in ISO/IEC 14496-3:2009/Amd 3 in 2012. These standards have subsequently been updated, and the latest as of February 2024 are ISO/IEC 23003-3:2020 and ISO/IEC 14496-3:2019. These standards have subsequently been updated, and the latest as of February 2024 are ISO/IEC 23003-3:2020 and ISO/IEC 14496-3:2019.
Adoption

ISO/IEC 23003-3:2020 states that the main focus of this codec are "applications in the field of typical broadcast scenarios, multimedia download to mobile devices, user-generated content such as podcasts, digital radio, mobile TV, audio books, etc."

The MPEG group lists example use cases as 1. Digital Radio, Mobile TV, Audio books focusing on speech and speech with background noise contents including announcement, advertisement, and narration and 2 Multimedia Download and Real-time Play on Mobile devices focusing on various types of Music and movie contents.

There is very wide international support from academics, research institutions, and corporations. For example, the affiliations list from Convention Paper 8654, a group paper on the format, include Audio Research Labs; Dolby Sweden AB; Fraunhofer Institute for Integrated Circuits IIS; NTT DOCOMO, INC.; Panasonic Corporation; Philips Research Laboratories; Samsung Electronics; Sony Corporation; and Université de Sherbrooke.

    Licensing and patents

Regarding patents, ISO/IEC 23003-3:2020 states "The International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC) draw attention to the fact that it is claimed that compliance with this document may involve the use of a patent. ISO and IEC take no position concerning the evidence, validity and scope of this patent right. The holder of this patent right has assured ISO and IEC that he/she is willing to negotiate licences under reasonable and non-discriminatory terms and conditions with applicants throughout the world. In this respect, the statement of the holder of this patent right is registered with ISO and IEC. Information may be obtained from the patent database available at www.iso.org/patents.

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights other than those in the patent database. ISO and IEC shall not be held responsible for identifying any or all such patent rights."

Transparency Depends upon complex algorithms and tools to read; will require sophistication to build tools. An example of implementation can be seen in the MediaInfo codebase, parsing USAC metadata.
Self-documentation Technical (coding) information is contained in the headers for the "frames" that make up the bitstream.
External dependencies None. According to the IANA profile, USAC is vendor neutral and is supported by a wide range of encoders and decoders/players, including Multimedia, HLS Audio-Only Streams - IETF HTTP Live Streaming, SHOUTcast/Icecast2 Audio Streams.
Technical protection considerations According to the IANA profile, "It is the responsibility of the decoder/player client to respect and apply appropriate file security and protection against any potential malicious content." USAC data may include frames generated without the optional cyclic redundancy check (CRC). USAC's tags fields can transfer arbitrary material, including executable content. USAC objects are not signed or encrypted internally.

Quality and functionality factors Explanation of format description terms

Sound
Normal rendering Good support and strong adoption for a wide variety of use cases and industries.
Fidelity (high audio resolution) This format was created to support both high and lower-quality bitrates and still be well-suited for both music and speech.
Multiple channels ISO/IEC 23003-3:2020 supports single and multi-channel coding at high bitrates and provides perceptually transparent quality.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension loas
xhe
According to IANA. The extension "loas" is listed as preferred.
Internet Media Type audio/usac
See IANA.
Magic numbers See note.  None. See IANA.
Pronom PUID See note.  PRONOM has no corresponding entry as of March 2024.
Wikidata Title ID Q2494305
See https://www.wikidata.org/wiki/Q2494305

Notes Explanation of format description terms

General  
History

MPEG requested a call for proposals on unified speech and audio coding in October 2007, focusing on a codec to perform well with speech, mixed speech and music, and music, at a range from 12 kilobits per second (kbps) for mono signals up to 64 kbps for stereo signals. In July, MPEG evaluated the responses and selected the best proposal. Following this review, MPEG members worked in collaboration to increase the performance until it was accepted as a standard in 2012. In early 2012 the ISO/IEC JTC1/SC29/WG11 (MPEG) finalized the new MPEG-D Unified Speech and Audio Coding standard.

The AES Convention Paper 8654 noted that care was taken to keep the codec as lean as possible.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 03/26/2024