|Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact|
Content Categories >> Still Image | Sound | Textual | Moving Image | Web Archive | Datasets | Email and PIM | Design and 3D | Geospatial | Aggregate | Generic
Sound >> Quality and Functionality Factors
Table of Contents
• Normal rendering for sound
• Fidelity (support for high audio resolution)
• Support for multiple channels
• Support for downloadable or user-defined sounds, samples, and patches
• Functionality beyond normal sound rendering
Within the structured audio format category, the focus for this Web site is on note-based formats generated by systems for music composition and also used for playback. The most prominent note-based formats are associated with MIDI, the Musical Instrument Digital Interface, although there are many devotees of formats called MODs, from modules, sometimes called tracker files. Other formats contain data used by synthesizers that simulate the human voice, e.g., for telephone company directory services. Since voice-synthesis content is less likely to be added to the collections of the Library of Congress, this topic is not discussed here. This document also omits the discussion of compound documents that combine sound, images, and other forms of expression, or multi-track recordings in which individual tracks are managed as separate bitstreams (files). This latter subject is part of the topic associated with what this Web site calls bundling or packaging formats, e.g., AES-31 and METS.
Normal rendering for sound
Normal rendering must not be limited to specific hardware models or devices and must be feasible for current users and future users and scholars. This level of functionality is expected of any candidate digital format for preserving sound content, and is not mentioned as a factor for choosing among formats.
Fidelity (support for high audio resolution)
The two characteristics most often associated with fidelity for sound waveforms represented as Linear Pulse Code Modulated (LPCM) data are sampling frequency and word length (i.e., bit depth).1 Other factors may also influence fidelity, such as the presence of distortion, watermarking, or--in lossy compressed renderings derived from LPCM files--audible artifacts that result from the application of lossy compression. In general, uncompressed or lossless-compressed data offers the highest fidelity; however, lossy compression based on understanding of human perception of sound provides a high level of fidelity in normal playback conditions. New techniques for lossy compression are being developed, tested, and standardized and specifications for what constitutes high-quality lossy compression can be expected to change over time.2
The effects of lossy compression may be reflected in other factors relating to choice of digital formats. In general, lossy compression detracts from transparency, desirable in any format chosen for long-term preservation. Also, constraints on data transfer rates may require balancing support for multi-channel audio (surround sound) with fidelity limitations in individual channels.
The treatment of spoken word content or artificially generated speech in the context of telephony (and related activities) has a special character, using coding techniques such as µ-law, LPC (linear predictive coding), and GSM (Global System for Mobile telecommunications) compression. Content from these modes of communication is unlikely to find its way into Library of Congress collections.3
Support for multiple channels
Waveform bitstreams generally encode multiple channels in interleaved or matrixed structures.4 For example, stereo or two-channel sound represented in linear PCM content typically employs an interleaved structure (alternating the information from the two channels), while surround-sound or multi-channel content employs additional data that is matrixed into two interleaved channels and decoded at playback time. (Multiplexing may also be used; this encoding can be compared to the heterodyning used in radio broadcasting.) The terminology for surround sound includes 5.1 (audio channels are sent to four directional loudspeakers at front left, front right, back left, and back right, and one non-directional low frequency/woofer), 7.1 (same, with two added side-center loudspeakers), and so on. In many cases, the practical constraints of digital data transfer rates during real-time playback force the creators of these formats to limit the fidelity of some channels, typically channels other than front left and front right.
Note-based files formatted in the General MIDI System can be organized in as many as sixteen channels to allow separate "instruments" to play simultaneously for a polyphonic effect, which may additionally be structured to represent aural space. Some composers have even positioned loudspeakers or synthesized instruments on a stage and then play note-based files through a controller, thereby replicating the sound of an ensemble.
Support for downloadable or user-defined sounds, samples, and patches
Functionality beyond normal sound rendering
A certain level of navigation is expected in normal rendering but some formats provide or link to data that supports elaborated capabilities, e.g., playlists; texts for recorded books; or descriptive information, including names of authors, performers, narrators; names of chapters or sections, or additional information of the types familiar to library users.
Normal rendering is provided by formats as presented to end users. Normal rendering may not be provided-or at least conveniently provided-by waveform formats that support rich-data representations of sound, i.e., representations sometimes called masters in a preservation-oriented activity. For example, rich-data versions of a linear PCM sound item may use very high sampling frequencies and/or long word lengths, and may consist of bundled packages that contain synchronized multi-track elements. It may not be possible to "play" such a file in real time on end-user equipment over a network. Such masters are typically used for repurposing and for the production of end-user forms of the same content that contain less data but offer convenient playability. The relationship between rich-data formats and normal rendering is that the rich-data item can be used to produce an end-user copy that successfully supports normal rendering.
1 An alternate approach to LPCM has been implemented by SONY and is called by them DSD (Direct Stream Digital). Audio engineers also refer to this encoding as pulse width modulation (PWM) or delta-sigma (or sigma-delta) modulation, also abbreviated as DSD. DSD is a one-bit-deep coding and, in the SONY implementation, employs a data rate of 2.8224 megabits per second. At this writing, DSD is exclusively heard on Sony SACDs (Super Audio Compact Disks), and the writers of this document are not aware of any media-independent DSD format. Comments in the engineering press vary; some state a clear preference for high-resolution LPCM sampling, while others politely say that both approaches will be acceptable to most audiophiles.
2 As of 2003, the Library of Congress Motion Picture Broadcast and Recorded Sound Division, considers an MP3 file with compression to support a 128 kilobits/second playback data rate per channel (256 kilobits/second for stereo) to be at the lower end of quality acceptable for published music. Lower-quality compression would be acceptable for culturally significant, but home-produced sound.
3 Speech-compression formats are used for some digital recorded books and other recordings. For example, the downloadable book and radio broadcasts available from Audible, Inc. (http://www.audible.com) are offered in five formats, including formats based in speech compression and others based in the formats typically used for music, i.e., MP3. The Library's preference is likely to be for the higher quality version-MP3-based in the case of Audible, Inc., rather than a lower-quality, speech-compressed version.
4 Stereo or surround sound may also be represented by multiple separate waveform files synchronized within wrapper formats like AES31 or SMIL_2_1 (Synchronized Multimedia Integration Language, Version 2.1). These wrapped sets of files may represent the aural space of a sound field, or may simply contain alternate content, like English- and Spanish-language versions, narrative commentaries on a parallel stream, or the like. Multi-file wrapper formats are not covered in this document. The AES-31 format, apparently not widely adopted at this time, is intended for management of multi-track sound content produced in recording studios, typically with the intention of later mixing to stereo or surround.
5 Two techniques for sound generation are employed, depending on the format: direct inclusion of waveform segments and synthesis. The preferred method for synthesis of instrument sounds (in 2004) is wavetable synthesis. Formats exist for the exchange of wavetable sounds (sometimes called sound fonts) independent of music formats. An older synthesis technique is called frequency modulation synthesis (FM).
Back to top