Sustainability of Digital Formats: Planning for Library of Congress Collections

Speex Audio Codec, Version 1.2

Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Speex Audio Codec, Version 1.2
Description Speech codec designed for packet networks and voice over IP (VoIP) applications but not for mobile phones. File-based compression is also supported. The flexible codec is based on Code Excited Linear Prediction (CELP) and supports a wide range of speech quality and bit-rates. The VoIP-oriented design means that Speex is robust to lost but not to corrupted packets. Because Speex is targeted at a wide range of devices, its memory footprint is modest and its complexity, which is variable, may also be modest.
Production phase Generally used for final-state, end-user delivery.
Relationship to other formats
    Used by Ogg_SPX, Ogg Speex Audio Format
    Affinity to CELP, Code Excited Linear Prediction. Not documented at this Web site at this time.

Local use Explanation of format description terms

LC experience or existing holdings In 2007, consideration was being given to the use of Ogg_SPX for service copies of oral history recordings for access via the Web.
LC preference LCPM preferred for master copies.

Sustainability factors Explanation of format description terms

Disclosure Fully documented. Developed by xiph as an open source and patent-free project.
    Documentation The Speex Codec Manual, Version 1.2 Beta 2, May 22, 2007.
Adoption See Ogg.
    Licensing and patents The specification provides the license in Appendix D. It is inspired by the BSD (Berkeley Software Distribution) family of free, near-public-domain software licenses. Paraphrasing appendix D: redistributions of source code or binary versions are free but must retain the copyright notice and other wording; the name of the Foundation or of contributors may not be used to endorse or promote products without specific prior written permission.
Transparency Encoding depends upon algorithms and tools to read; requires sophistication to build tools.
Self-documentation See Ogg.
External dependencies None.
Technical protection considerations See Ogg.

Quality and functionality factors Explanation of format description terms

Normal rendering Good support.
Fidelity (high audio resolution) This is compression designed for comprehensible speech, not for a rich representation of a full audio spectrum and dynamic range. Paraphrased from the specification: CELP was selected as the encoding technique; it scales well to both low bit-rates (e.g. DoD CELP @ 4.8 kbps) and high bit-rates (e.g. G.728 @ 16 kbps). Speex is designed for three different sampling rates: 8 kHz, 16 kHz, and 32 kHz, referred to as narrowband (telephone quality), wideband, and ultra-wideband. The encoding process is controlled most of the time by a quality parameter that ranges from 0 to 10. In constant bit-rate (CBR) operation, the quality parameter is an integer, while for variable bit-rate (VBR), the parameter is a float. There is also Average Bit Rate that dynamically adjusts VBR quality in order to meet a specific target bit-rate. The management of bit-rate is important in VoIP, where the maximum must be low enough for the communication channel.
Multiple channels Provides intensity stereo coding. 1
Support for user-defined sounds, samples, and patches None.
Functionality beyond normal rendering Not investigated at this time.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Internet Media Type audio/x-speex
For Speex-in-Ogg, from the main part of the specification. However, the recommended practice now seems to use the codecs parameter as described in RFC 5334.
Internet Media Type audio/ogg; codecs=speex
From RFC 5334.
Internet Media Type audio/speex
Proposed in various drafts of RTP Payload Format for the Speex Codec. e.g., (obsolete and hence not actively linked from this format description).
Other Speex
Ogg Codec Identifier. An 8-character string, with 3 trailing spaces, used within Ogg container, at beginning of first header page to identify codec. See IETF RFC 5334
Microsoft FOURCC spex
Other speex
Codec identier, long ID. From

Notes Explanation of format description terms


Format specifications Explanation of format description terms

Useful references


1 Intensity stereo as explained in the Wikipedia article Joint (audio engineering) (consulted August 24, 2007): "More specifically, the dominance of inter-aural time differences (ITD) for sound localization by humans is only given for lower frequencies. That leaves inter-aural amplitude differences (IAD) as the dominant location indicator for higher frequencies. The idea of intensity stereo coding is to merge the upper spectrum into just one channel (thus reducing overall differences between channels) and to transmit a little side information about how to pan certain frequency regions to recover the IAD cues."

Last Updated: Wednesday, 22-Feb-2017 13:45:55 EST