National Information Standards Organization

Digital Talking Book Standards Committee

- Playback Device Guidelines -

Prioritized List of Features for Digital Talking Book Playback Devices

Version Date: December 30,1999



  1. General
  2. Physical Characteristics
  3. Audio Characteristics
  4. Power
  5. Media
  6. User Interface
  7. Document Navigation
  8. Help Functions
  9. File/Stream Format
  10. Compatibility
  11. Multilingual Functions
  12. Presentation Functions
  13. Copyright
  14. Administrative Aspects

The National Information Standards Organization (NISO) is a nonprofit association accredited as a standards developer by the American National Standards Institute. At the initiative of the National Library Service for the Blind and Physically Handicapped, Library of Congress, NISO Standards Committee "AQ" was formed in March, 1997, to develop a national standard for a digital talking book (DTB) for blind and physically handicapped readers. A DTB is envisioned to be, in its fullest implementation, a group of digitally-encoded files containing an audio portion recorded in human speech; the full text of the work in electronic form, marked with the tags of a descriptive markup language; and a linking file that synchronizes the text and audio portions. As this document illustrates, such a structure will allow the DTB user a broad range of capabilities not possible with current talking books.

The NISO DTB committee developed this list of features desired by users for a DTB playback device. It is intended for use as a guideline by organizations designing DTB players. The committee recommends that three types of playback device be developed, as described below:

  1. Basic DTB player This machine would be a portable unit capable of playing digital audio recordings. Its primary characteristic would be its ease of operation. It would therefore include a limited number of controls and functions. It would not include the capability to access a full-text file that might accompany the audio file. It would be used mostly by less sophisticated talking-book readers who wish to read primarily in a linear fashion. Both this device and the following one are envisioned as containing a speaker of sufficient quality for long-term listening. Neither is seen as an "ultra-portable" which requires the use of headphones. There will certainly be a need for such very lightweight players, but they are not the subject of these guidelines.

  3. Advanced DTB player This device would also be portable but would be designed for use by students, professionals, and others who wish to access documents randomly, use sophisticated navigation tools, set bookmarks, etc. It would include most of the features found in the computer-based version.

  5. Computer-based DTB player This player would consist only of software and would operate on a user-supplied personal computer. It would provide the most complete and sophisticated suite of features of the three players.
Each feature listed below was assigned one of the following three priorities for each of the three types of player:
A. Essential
B. Highly desirable
C. Useful
So the designation 1C, 2B, 3A following a feature indicates that it was classified as useful for the level 1 player, highly desirable for the level 2 unit, and essential for the level 3 playback device. A digital talking book that is compliant with NISO standard X?X must be able to play on all three types of players, but with different levels of functionality.
(Back to Contents)
1. General

1.1 Simplest version, no more complex than C-1 (1B)
The basic device is no more difficult to operate than the standard NLS C-1 cassette player.

1.2 ???Meets accessibility guidelines for telecommunications devices (specifically for hearing impaired) (1A, 2A, 3A)

(Reference section 508.)

(Back to Contents)
2. Physical Characteristics

2.1 Weight

Device weighs less than 1.59 kg (3-1/2 pounds). (1A, 2A)
Device weighs less than 1.36 kg (3 pounds). (1B, 2B)
Device weighs less than 1.13 kg (2-1/2 pounds). (1C, 2C)

These weights are based on the assumption that the power transformer is an external unit integrated with a power cord that is detachable from the unit. The weight of the power cord and transformer is not included in the above weights.

2.2 Dimensions

Device is relatively compact, with the sum of height, width, and length not exceeding:

50.8 cm (20 inches) (1A, 2A)
45.72 cm (18 inches) (1B, 2B)
40.64 cm (16 inches) (1C, 2C)
2.3 Durability (1A, 2A)

Device must be at least as durable as high-quality handheld portable consumer electronics devices of similar manufacture (e.g., solid state or electromechanical).

2.4 Operates in wide range of environmental conditions (1A, 2A)

Device operates normally under conditions of extreme temperature and humidity: -20 degrees C (-4 degrees F) to 50 degrees C (122 degrees F) and 20 percent to100 percent relative humidity (non-condensing).

2.5 Shock-resistant (Sound does not skip when device is jarred) (1A, 2A)

2.6 Spill proof (1A, 2A)

Device is constructed so as to prevent spilled liquids from entering.

2.7 Vermin-proof (1A, 2A,)

Device is constructed so as to prevent infestation by vermin. Internal heat sources may attract such infestation.

2.8 If a physical medium is used, it will not be ejected if the device is dropped. (1B, 2B)

2.9 Output connectors: headphone jack (for headphones and amplified speakers), line-out jacks (1A, 2A)

Many readers use headphones or external speakers for improved sound quality.

2.10 Speaker

2.10.1 Device has a built-in speaker (1A, 2A)

2.10.2 Ability to point speaker

Many older users have impaired hearing at higher frequencies. Speakers are directional at higher frequencies, so it shall be possible to direct their output toward the user for optimal performance. (1B, 2B)

2.10.3 Speaker capacity

Speaker shall be able to reproduce a 1KHz sine wave and generate 80db SPL at 1 meter with minimal distortion. (1A, 2A)

2.11 Device has a small footprint and is physically stable, with a low center of gravity. (1A, 2A)

2.12 The power transformer is integrated with a power cord that is detachable from the unit. (1B, 2B)

2.13 Device exceeds UL standards for safety (1A, 2A)

2.14 Device includes easily-distinguishable tactile markings for controls, jacks, etc. (1A, 2A)

2.15 Controls are color-coded (1A, 2A)

2.16 Print labeling is in a high-contrast, upper/lowercase, large-print, sanserif font (14 point or larger) (1A, 2A)

2.17 Device is aesthetically pleasing (1B, 2B)

2.18 Device is easy to clean (1A, 2A)

2.19 Device is easy to maintain (1A, 2A) (It may be desirable that it be repairable by volunteers)

2.20 External digital input/output connectivity

Ports and system shall enable remote diagnostics, external controls, exporting of text, sharing of bookmarks, etc. (1A, 2A)

Software must support data exchange through operating system (3A)

2.21 Non-skid feet

Device shall have non-skid feet to minimize accidental movement on low-friction surfaces. (1A, 2A)

(Back to Contents)

3. Audio Characteristics

3.1 Sound quality (1A, 2A, 3A) (To be quantified)

Audio material shall be distributed in a form that provides a minimum of FM quality sound to the user. (This section will be expanded, based upon Audio Engineering Society project AES-X74, Recommended Practices for Internet Quality Descriptions.)

3.2 Volume control (1A, 2A, 3A)

The user can adjust the volume to meet hearing and listening conditions.

3.3 Tone control (1A, 2A, 3B)

The user can adjust the equalization (the amount of low, mid-range or high frequencies) to suit individual tastes and listening conditions. The device shall include a high-frequency adjustment to compensate for age-related hearing loss.

3.4 Variable speed, with pitch restoration (1A, 2A, 3A)

Playback speed is adjustable over a range from 1/3 to at least 3 times the normal playback speed, and the pitch of the narration is not altered by the speed change. The Time-Scale Modification system shall not produce audible chopping, burble, or reverberation and shall not skip over significant units of sound at high playback speeds.

3.5 Pitch restoration control; independent of speed (1B, 2B, 3B)

Although pitch restoration may be the default, the user can adjust the pitch of the narrator's voice independent of the speed to make it more intelligible at extreme speeds, or to compensate for hearing loss.

3.6 Amplitude compression (1C, 2C, 3C)

Gives the user a method of narrowing the volume range between the quietest and the loudest portions of a talking book. This would be useful when listening to the book at a distance, in a noisy environment, or for a person whose hearing has limited dynamic range. The AC-3 audio compression standard includes this function.

3.7 Stereo (3B)

Could be useful in auditorially displaying spatial arrangements of information, such as matrices in T.V. Raman's ASTER system.

3.8 Balance control (stereo) (3A)

(Back to Contents)

4. Power

4.1 Device operates off of AC and battery power. (1A, 2A)

4.2 Smart (multivoltage) power supply (1B, 2B)

Adapts automatically to local AC voltage.

4.3 Battery saver (1A, 2A)

Player automatically manages battery discharge for maximum battery life.

4.4 Battery indicator (1A, 2A)

The user is automatically warned when the battery is low but can also query the device at any time to get a reading of current battery discharge/charge status.

4.5 Ability to accept standard batteries (1A, 2A)

Even if a special rechargeable battery pack is provided, off-the-shelf batteries can be purchased and used in the player.

4.6 Battery capacity

The user can expect the player to operate on a fully-charged battery at normal speed and 60 db SPL measured at one meter for:

6 hours (1A, 2A)
10 hours (1B, 2B)
14 hours (1C, 2C)
(Back to Contents)

5. Media

5.1 Media/data robustness (1A, 2A, 3A)

The distribution system or media will remain free of perceptible errors for a long time, under a wide variety of handling or transmission conditions, without placing unreasonable burdens on users.

5.2 One-way medium (1B, 2B, 3B)

Media would not need to be returned to the library.

(Back to Contents)

6. User Interface

6.1 Controls designed for accessibility by maximum range of abilities (1A, 2A, 3A)

Controls are physically differentiable by touch; require a minimum of hand/arm strength to operate and a minimum of dexterity to manipulate, and are operable by devices such as mouthsticks.

6.2 Universal control interface (1A, 2A, 3A)

The user interface allows control by all users including, for example, those who can only access the device with a breath switch and those who require an interface in a language other than English. Users with other types of interface requirements will be accommodated.

6.3 No need to use visual display to operate device (1A, 2A, 3A)

The user interface does not require the ability to view a visual display. All interaction by the user can be done via audible feedback with input by physical controls/audio input/keyboard.

6.4 Single button control (1 button at a time) (1A, 2A, 3A)

The pressing of multiple keys simultaneously is not required; if a "shift" function is used, it must operate in a "sticky" or sequential manner where the first key is released before the second key is pressed.

6.5 Immediate interruption of audio messages when any key is pressed (1A, 2A, 3A)

While an audio message is playing, the pressing of any other key will stop the playing of the message immediately and the playing of any new message (or the document) will begin.

6.6 Multiple user-defined default profiles (2B, 3B)

Several user-defined configuration profiles can be created, stored, and retrieved for a variety of reading configurations.

6.7 User preferences non-volatile (1C, 2B, 3B)

Stored user preferences are saved in such a manner so that they are retained when the machine loses power or is turned off.

6.8 Reset to known state, factory or user-defined (1A, 2A, 3A)

A software/hardware reset by the user is possible so the device can be returned to a previously known state (to factory default settings or a previous user-defined configuration profile).

6.9 Cancel/Undo features (2B, 3B)

"Cancel" is used for commands that are in process and have not yet been executed. For example, if the user presses the "Go To" button but then decides not to complete the command (i.e., enter the page number, etc.), he would use the "cancel" feature. The "undo" command returns the user to the point or status before a command was executed. For example, using the "undo" feature after setting a bookmark would remove that bookmark.

6.10 Continuous long play without intervention (1A, 2A, 3A)

The device has the ability to play continuously on AC power, without user intervention, the full contents of a document (e.g., book or magazine) up to the full capacity of the storage/delivery medium utilized.

6.11 Sleeping user detector (1C, 2C)

The interface includes a detector that can determine when the user has fallen asleep and automatically turn off the machine.

6.12 Locator device (1C, 2C)

The device can be located audibly by an external means such as a remote control or a hand clap.

(Back to Contents)

7. Document Navigation

7.1 Start and stop (1A, 2A, 3A)

The machine/output can be started/stopped easily.

7.2 Instant stop/start (1A, 2A, 3A)

Playback can be stopped/started instantly by the user. The device can be stopped and then restarted and no audio material will be missed.

7.3 Multiple levels of granularity -- document accessible at fine level of detail (2A, 3A)

Document can be accessed at various levels of definition such as by paragraph, by sentence, or by word, depending upon user needs.

7.4 Useable table of contents/index (1A, 2A, 3A)

User can access the table of contents/index from any point in the book and select an entry for immediate access. Links from table of contents lead to the item selected. Links from index lead to the top of the page selected.

7.5 Easy skips (defined by document) (1A, 2A, 3A)

User can skip through the document by segments defined by document elements such as chapters, pages, paragraphs, etc. In this mode, the user moves through the elements sequentially, rather than jumping directly to a specific target.

7.6 Ability to move directly to a specific target

7.6.1 User can easily jump directly to a specific point in the document such as page 56 or chapter 12. The user selects or enters the name of the target and then initiates the jump. (2B, 3A)

7.6.2 User can jump directly to a location identified in the document as a target (for example a cross-reference). The user encounters a cross-reference, for example, "See Appendix 5," and activates the link to that location. When a user prompts the device to follow a link, the device launches the nearest previous link. This allows a user to activate a link even if he has not reacted immediately after being notified of the link. See section 7 of Text Navigation Features List (1A, 2A, 3A)

7.7 Cross-Reference notification
7.7.1 The user has the option of being notified via an audible signal when a cross-reference is encountered. Default for device 1 is to disable the audio signal; for devices 2 and 3 default is to enable it. (1C, 2A, 3A)

7.7.2 User may choose among several audible indicators. (2A, 3A)

7.8 Internal and external cross-reference targets
7.8.1 The user can query the player as to whether a link leads to an internal or external target, since the decision to follow a link may depend on the target's location. (2A, 3A)

7.8.2 The user can ask for the current state of the player, so if the device is sitting quietly after an external link has been activated, the user can determine what action to take, if any. (2A, 3A)

7.8.3 If the user feels too much time has elapsed after activating a link to an external target, he or she can cancel the request without causing the system to crash. If the user needs to take some action, the player will prompt him or her. (2A, 3A)

7.8.4 The device can continue to play while retrieving external resources. (2B, 3B)

7.9 Navigation Control Center (1A, 2A, 3A)

Device includes a Navigation Control Center which allows the user to easily obtain an overview of the structure of the book and provides a convenient means for navigating through it. See sections 5 and 5.1 of the Text Navigation Features List.

7.10 Ability to retrace steps (1A, 2A, 3A)

Device maintains a "history" file of the locations the user has passed through when moving through the document in discrete steps (e.g., as described in 7.4, 7.5, and 7.6 above). The user can move backwards and forwards through that list of locations.

7.11 Easy repeats, skips (user-defined, time-based) (2C, 3C)

User can skip through the document by time intervals set by the user.

7.12 Ability to skip over user-selected text elements (2B, 3B)

Various elements of the document such as page numbers, footnotes, picture captions, tables, sidebars, and production notes (material added by the talking book producer) can be skipped by the user. For example, the configuration profile can be set to specify that optional production notes should be skipped (required production notes cannot be skipped).

7.13 Ability to manage notes (2A, 3A)

Footnotes can be managed in a variety of ways. The user can set her configuration profile so that:

  1. footnote references and the footnotes themselves are automatically played in full,
  2. footnote references are played but the footnotes are not unless the user chooses to hear one, or
  3. footnote references and footnotes are skipped.
7.14 Reading of notes (1A, 2A, 3A)

At any time during the reading of a note, the user shall be able to interrupt the reading and return to the point in the text immediately following the note reference.

7.15 Setting and Labeling Bookmarks

7.15.1 User can set one or more bookmarks for later access. Bookmarks are saved even when machine is turned off and are deleted only upon user initiation. The device accommodates at least 100 simple bookmarks (those that simply mark a point but do not label it). (This limit does not apply to the type 3 device, where the capacity of the hard drive would set the limit.) (1A, 2A)

7.15.2 User can tag bookmarks with text or voice labels. The same label can be assigned to multiple bookmarks to create a set of related bookmarks. The user can browse through all existing bookmarks. (See section 9 of Text Navigation Features List.)

The device has sufficient capacity to store the following volume of voice-labeled bookmarks at telephone quality (8KHz sample rate, 8-bit samples):
15 minutes (2A)
30 minutes (2B)
60 minutes (2C)
(This limit does not apply to the type 3 device, where the capacity of the hard drive would set the limit.)

7.16 Automatic bookmark at stop - ability to bypass (1A, 2A, 3A)

If selected by the user, the software will automatically place a bookmark at the point in the playback where the user stops. The user can choose to bypass this feature.

7.17 Separate sets of bookmarks stored for each book (1A, 2A, 3A)

For example, if the user has four different books in process, the device maintains four separate sets of bookmarks.

7.18 Ability to name and export bookmarks (2B, 3B)

A user may need to move bookmarks from one machine to another if one is being returned for maintenance, or may wish to share his list of bookmarks with others reading the same book. Bookmarks shall follow NISO standard X?X to unsure interoperability among playback devices.

7.19 Ability to add information (highlighting and notes) (2B, 3A)

User can highlight portions of text, assigning text or voice labels, or more lengthy notes, to each marked section. The user can browse through a list of the full set of highlighted sections and then jump directly to a chosen portion. When the user is re-reading the document, an audible indicator should identify highlighted portions. As with cross-references, user options include enable (default), disable, and a choice among several audible indicators. The user should also be able to learn what label, if any, was attached to the highlighted section. (See section 10 or Text Navigation Features List.)

7.20 Ability to mark text for later access or export (2C, 3A)

Text can be marked/exported for later use (scrapbook, citations; i.e., a clipboard function that can be appended or overwritten). Limitations on the amount of text that may be exported will be set to meet copyright requirements.

7.21 Current location (possibly including time remaining to end of book (1B, 2B, 3B)

User can obtain information about current location in book relative to the entire book such as "n% read, n% remaining" or "n hours remaining," and logical location such as "chapter 5, page 129, paragraph 3".

7.22 Spell words (2A, 3A)

Words can be spelled upon user request; spelling can also be done by phonetic alphabet (alpha, bravo, Charlie, etc.).

7.23 Search by words, groupings of words (3B)

Text file can be searched by word/groups of words as input by the user.

7.24 Quick and easy access to multiple books (1A, 2A, 3A)

The interface allows the user to quickly access other documents, e.g., the user should be able to switch from one book to another quickly whether the book is a physical medium or has been/is being retrieved electronically. Documents can be switched easily with all user settings for the new document intact.

7.25 Multiple source capability (2C,3C)

A variety of media/data sources can be accessed such as different physical media, a network stream, cable, and satellite.

7.26 Multiple users on 1 book simultaneously (2A/B, 3A/B)

Playback medium can be configured so that more than one user can access the book without disrupting settings set by previous users.

7.27 Automatic threading (follow story or article to completion) (1A, 2B, 3B)

Complicated document format can be followed in a logical order.

7.28 Multiple modes for accessing tables (2A, 3A)

The user can choose among a variety of ways in which a table can be read -- for example, by row, by column, or by specific cell. (See section 15 of Text Navigation Features List.)

7.29 Reading nested lists (???)

The user can invoke a function that assists with the comprehension of the layout of nested lists, specifically, the level at which a given items falls within the list. (See section 16 of Text Navigation Features List.)

7.30 Ability to identify text attributes (subscript, superscript, underlined, bold, etc.) (ASTER-like) (3A)

Text attributes can be easily accessed and displayed upon user request. The function can be enable/disabled by the user.

7.31 Key information software -- searches document for key features (2C, 3C)

Software such as "Speech Skimmer" identifies most important elements in a document.

7.32 Fast forward and fast reverse (1A, 2A, 3A)

Controls allow user to move forwards or backwards through the text at 5 to 20 times normal speed, with audible feedback ("chatter," tones, or spoken cues) providing information on the structure of the document. See section 2 of Text Navigation Features List.

(Back to Contents)

8. Help Functions

8.1 Key identifier (1B/C, 2A, 3A)

Buttons identify themselves when pressed. This function can be enabled/disabled by the user.

8.2 Confirmation messages ("earcons") (1A, 2A, 3A)

Spoken messages confirm user's action, e.g. system says "play" when "play" key is pressed. "Verbosity level" (how frequently the system speaks) is configurable by the user. (2B, 3B) (There are at least two dimensions for varying verbosity: 1. What percent of the controls are audibly confirmed? An experienced user might want only one or two important actions confirmed; a novice might want all confirmed. 2. How extensive is the confirmation message? Might experienced users need only a single word, e.g., "contents," while novices need a more extensive explanation such as "move to table of contents"?)

8.3 Audible error feedback (1A, 2A, 3A)

Tones or spoken messages are generated when user attempts an invalid function, e.g. pressing the "move forward" key when already at the end of the document. Configurable/queryable. One configuration might involve the use of a tone to indicate an error, with a spoken message following if invoked by the user.

8.4 "Drill down capability" (Tell me where I am.) (1B, 2B, 3B)

More button presses provide more detailed information. For example, multiple presses might elicit location information such as title, chapter, page, etc.

8.5 Help functions

Basic help functions (1B)
Rich set of context-sensitive helps (2B, 3B)

8.6 Summary Information

User can query the device and obtain a quick overview of the document (e.g., 563 pages, 9 hours of play, 4 levels of heading, 2 parts, 12 chapters, 5 tables). Information is dynamic and can be tailored to the contents of the user's current location (e.g., chapter 11). (See section 20 of Text Navigation Features List.)

8.7 Optional reminders (e.g., book overdue) (1B, 2B, 3B)

Capability exists for lending agency to add audible reminders to documents distributed to users.

(Back to Contents)

9. File/Stream Format

9.1 Rich source base/file (able to produce various production outputs) (1A, 2A, 3A)

The original document file contains as much information as possible so that the variety of outputs possible is maximized.

9.2 Human and electronic speech must be available; but both are not required to play book (1A, 2A, 3A)

Although the system is designed for the linking of electronic text and narrated audio, a document can be played even when one of these modalities is not included.

9.3 Functionality transferable to other devices (platform independent) (3B)

The software-based playback system must be useable on several types of computers.

9.4 Functionality based on source (e.g. presentation software comes with book) (1C, 2C, 3C)

A digital talking book includes some or all of the software needed to play it; perhaps the behavior of the player can be modified or updated with software that accompanies the document.

(Back to Contents)

10. Compatibility

10.1 Compatible with commercial books (1B, 2B, 3B)

The device or software can play audio/electronic books which were not produced specifically for use by blind or physically handicapped persons.

10.2 Can play music (1C, 2C, 3C)

(Back to Contents)

11. Multilingual Functions

11.1 Multi-lingual message capability (1B, 2B, 3B)

Control messages such as key identification, confirmation, error feedback, etc. can be rendered in multiple languages.

11.2 Multi-lingual text to speech (2C, 3C)

Text files can be translated into synthetic speech in multiple languages.

(Back to Contents)

12. Presentation Functions

12.1 Text file can be read via a connected braille display. (2A, 3A)

12.2 Text file can be displayed through link to video monitor. (2B, 3B)

12.3 Characteristics of visually-displayed information can be controlled (color, size, zoom, contrast, font, spacing, background, etc.) (3B)

12.4 Visual elements can be presented in alternative formats

Speech (1A, 2A, 3A)
Others (2B, 3B)
(Back to Contents)

13. Copyright

13.1 Copyright protection system ensures that audio and text files of copyrighted material are accessible only to eligible users. (1A, 2A, 3A)

14. Administrative Aspects

14.1 Self diagnostics (1A, 2A, 3A)

Device stores hours used, last service date, etc. Reports source of malfunction when queried.

14.2 Accessible serial number (1B, 2B)

Serial number is accessible to blind and visually-impaired users contacted by auditors verifying machine location. Number is present in large print and can be spoken by the device on command. May be present in tactile form as well.

14.3 Device malfunctions can be diagnosed remotely. (1B, 2B, 3B).

14.4 Easy to upgrade (1A, 2A, 3A)

(Back to Contents)

Document created December 30, 1999
Tagged with HTML April 7, 2000
Send comments to:

Michael Moodie
[email protected]
Research and Development Officer
National Library Service for the Blind and Physically Handicapped
Library of Congress
Washington, DC 20542