Digital Talking Books, Planning for the Future

July 1998

Working with the National Information Standards Organization

December 1996: Technical Standard to Be Developed for Digital Talking Book

In December 1996, NLS director Frank Kurt Cylke announced that NLS had initiated the development of a technical standard for digital talking books (DTBs) through the National Information Standards Organization (NISO). This action was the first step in designing and implementing the next-generation library-accessible medium for blind and physically handicapped individuals.

A major development for the NLS program, the NISO digital talking-book standard will address the features, file specifications, user control of playback devices, production issues, and copyright protection scheme. Parties participating will include patrons, patron-advocacy organizations, media producers (both volunteer and commercial), rights owners, equipment producers, and librarians.

NISO is the only organization accredited by the American National Standards Institute (ANSI) to develop and maintain technical standards for information services, libraries, publishers, and others involved in the business of creation, storage, preservation, sharing, accession, and dissemination of data. There are currently more than fifty American National Standards used by such organizations, including CD-ROM Volume and File Structure (NISO/ANSI/ISO 9660), Information Retrieval (Z39.50), Information Interchange Format (Z39.2), International Standard Serial Numbers (Z39.9), and Common Command Language (Z39.58).

"At present, library access for patrons is well served by analog cassette tape technology," Cylke said. "This technology has enjoyed the acceptance and economy found in the consumer entertainment market for more than two decades. However, as digital technology gains favor in the marketplace, analog cassettes are likely to become less attractive from both the financial and consumer-preference standpoints. These two forces, economic and preferential, will ultimately converge to motivate change. This NISO standard development program will allow the change from analog to digital to be controlled and consistent with the interests of all concerned."

Developing the Standard

In announcing the project, NLS research and development officer Michael Moodie, who directs the project activities, outlined the scope and application of the DTB standard. According to Moodie, "The standard will define the file specification for a digital talking book; that is, the manner in which the various components of the DTB are coded. In addition, these sets of related guidelines will be developed: required features in a DTB, the user interface for a DTB player, and DTB production guidelines. Potential implementers include talking-book producers, manufacturers of digital and analog hardware, developers of multimedia authoring and presentation software, and media producers."

Patricia Harris, executive director of NISO, announced that NLS, as the sponsoring organization, will chair the standards-development committee. Representatives will be invited from the American Council of the Blind, Association for Education and Rehabilitation of the Blind and Visually Impaired, American Foundation for the Blind, American Printing House for the Blind, Blinded Veterans Association, National Federation of the Blind, and Recording for the Blind and Dyslexic. Other organizations from both the public and private sectors will be included, along with representatives of engineering and library interests.

Howard White, editor of the American Library Association's Library Technology Reports, will serve as liaison between NISO and the NLS-sponsored effort. Commenting on the complexity of the undertaking, John Cookson, head of the NLS Engineering Section, said, "The impact on users moving from existing practices to the new digital standard must range from 'virtually transparent' (products seemingly the same to the user but with technical improvements) to 'profound' (products with a range of options for the more technologically sophisticated patron).

"This impact statement focuses on the blind and physically handicapped patron," said Cookson. "However, there is an infrastructure of 'users' who support and implement the library system. This wider community includes librarians; producers of talking books and magazines both commercial and volunteer; equipment manufacturers; and software developers. Each of these groups will be affected differently by the change to a digital standard. For example, audio studios may continue to narrate into conventional analog equipment, but their product would become usable only by processing through digital encoding software that is not found in today's production stream," Cookson concluded.

Implementing the Standard

Wells B. Kormann, chief of the NLS Materials Development Division, also commented on the value and potential implementation of the project: "The entire user community will be motivated to use this standard. The existing system is analog cassette tape, while the standard will define a system that will be digitally based but not restricted to any particular distribution media or implementation. Because of this fundamental incompatibility, the change will require a transition period in which both systems are in use. Time frames for introduction of new equipment depend on the commercial development and availability of adaptable consumer electronic hardware and software products."

According to Kormann, "Anticipating an additional ten years of acceptable and economical use for cassette tapes means that the standard must be finished within five years to allow for a five-year transition period."

May 1997: NISO Process Begins for Digital Talking Book

The NISO Talking Book Standards committee held its first meeting in May 1997 to begin the process of designing a DTB system. This two-day standards development meeting was hosted by NLS. More than two dozen individuals from companies, libraries, and organizations that serve and represent blind and physically handicapped persons attended. The representatives contributed expertise in the areas of consumer electronics, library service, engineering, audio book production, computers, standards development, international compatibility and design, information access, and adaptive technology.

Rosemary Kavanagh, executive director of the Canadian National Institute for the Blind's Library, said of the meeting, "The spirit of cooperation and workmanship which prevailed was excellent and commendable." She added, "I remain much more hopeful that we will see standards that not only allow us to exchange books and materials but also to buy shelf-ready items from talking-book producers the world over."

Features Identified

Committee members identified and prioritized more than one hundred features for a future DTB system. These features describe the needs of blind and physically handicapped persons who will be using the system. "By focusing on user requirements rather than specific hardware or media, we can develop a standard that will keep pace with the rapid changes in technology," said Michael Moodie, NLS research and development officer, who is coordinating the committee's activities.

In order to define features for a system that does not yet exist, committee participants created an exhaustive list of characteristics for every aspect of the system. These included audio quality and controls, user interface, power sources and requirements, media navigation, help functions, copyright protection, multilingual functions, text display capabilities, and administrative considerations. The deliberations resulted in a description of a digital book that would incorporate text, voice, and other data with varying levels of user control, functionality, and data richness available to the reader.

The committee proposed three levels of audio playback devices: a basic six-button, portable, audio-only model; an advanced stand-alone model with more capabilities for students and professionals; and a computer-linked software version that would be connected to a personal computer and allow for advanced text navigation. After defining both the features and the levels of complexity, the participants assigned each feature to the appropriate device and determined whether the features were essential, highly desirable, or useful. For example, start and stop controls would be essential features on all three machines, while the ability to add notes or to highlight text would not be present in the basic machine, but would be highly desirable in the advanced unit and essential in the computer-linked version.

Working Groups Formed

The committee's next step was to form working groups to examine the many specialized areas of concern. One group will use the prioritized list of features to create an organized document with expanded descriptions and examples for review by the full committee. Another group will focus on the DTB file format. Digital file format guidelines and specifications are undergoing rapid developments and changes because of the explosive growth of World Wide Web (Internet) publishing. A third group will research user interface--specifically matters such as design of controls, tactile and visual markings, and feedback to users. Copyright protection in the digital domain became a fourth working group's area of research. This issue is currently receiving much attention in the digital publishing community.

The groups presented their findings at the committee's next meeting in September 1997. Under NISO guidelines, a first draft of the standard is due within eighteen months of the first meeting, by November 1998. A final standard is expected in two to four years.

September 1997: NISO Group Moves Ahead

Experts in the fields of digital technology, librarianship, service to blind and physically handicapped persons, and other related fields met at NLS in September 1997 to continue the process of developing a standard for a DTB system. Under the auspices of NISO, the Digital Talking Book Standards Committee of more than two dozen participants shared ideas and sharpened their focus to keep pace with fast-moving developments in technology.

At its first meeting in May 1997, the committee established working groups. Each working group was asked to concentrate on a particular area: expanded descriptions for the features list, file format for digital talking books, the user interface, and copyright issues.

At the second meeting of the full committee, in September 1997, members reported the findings of the working groups and continued discussion of specific features for a future DTB system. They determined that the bulk of the group's activities for the near future should concentrate on defining requirements for the digital talking book itself, rather than the playback device. The committee agreed that the central task was to develop a file specification for the content of the DTB.

File Specification

The file specification will describe how the audio and textual material of a digital talking book should be coded; that is, what codes should be used for given functions. For example, a digital book should be structured so that a blind patron listening to the table of contents could jump immediately to a desired chapter. The file specifications would describe what codes an audio book producer would insert in the recording to make that jump possible.

Mark Hakkinen, representing The Productivity Works, Inc., and chair of the file specification working group, reported significant activity in this area among several organizations. He explained that standards and protocols written for the Internet are "likely to play some role in the delivery of DTBs." The World Wide Web Consortium (W3C) is facilitating the creation of specifications that allow for the delivery of audio and text content through the Internet. W3C is an international industry consortium that develops common protocols for the evolution of the World Wide Web. In April 1997, the W3C launched the Web Accessibility Initiative to promote Web functionality for people with disabilities. "A major premise of the Web happens to be open, standards-based protocols and languages," said Hakkinen.

He also noted that the DAISY Consortium, a multinational effort originating in Sweden to design a DTB system, made a decision early in 1997 to move to an open, standard file format. George Kerscher, of Recording for the Blind and Dyslexic, and Thomas Christensen, of the Danish National Library for the Blind, represent the DAISY Consortium on the NISO DTB Standards Committee. Their input in both efforts will promote an open exchange of information and decrease duplication of effort.

Copyright Protection

The group examining copyright protection submitted its findings in the form of a special presentation by Mary Levering, associate register for national copyright programs, Library of Congress, and former chief of the NLS Network Division. Levering gave a brief history of U.S. copyright law, explaining its roots in the Constitution and its use in the protection of creative expression.

She noted that libraries for the blind use special measures, such as producing talking books in a nonstandard 4-track cassette format and ensuring that users meet eligibility standards, to curb the illegal redistribution of copyrighted works. These protections "help maintain a balance between the rights of copyright holders and the rights of users of those works," according to Levering.

In the digital arena, the retransmission of copyrighted works is a matter of great concern to copyright holders because of the ease of making an almost endless number of perfect copies of an original work. Levering outlined the activities of international organizations, such as the World Intellectual Property Organization, to deal with digital issues. She explained that these endeavors seek "to ensure that laws and practice adapt to protect and support the wonderful creative output without cutting it off at the limbs."

Levering explained that some of the technological controls under development include digital object identification, watermarking, data encryption, and electronic signatures for images, written text, and sound recordings.

Action Items Planned

The committee established two new working groups. The first new group will assemble a comprehensive list of navigation and manipulation features for an advanced computer-based audio playback device. This compilation will contain the full range of features desirable on a DTB. The second new group will craft digital production guidelines for audio-book producers. The file specification group will continue its work to create or identify an appropriate file format, ensuring that the file structure is capable of supporting all the features identified by the group working in that area. Where possible, members will try to match existing specifications so as not to duplicate ongoing work. The working groups will meet separately before the next full committee meeting.

March 1998: Working Groups Give Reports

As agreed at the September 1997 meeting of the Digital Talking Book Standards Committee, three working group meetings were held in January 1998. The committee, working under the auspices of the National Information Standards Organization (NISO) is developing a standard for a digital talking-book (DTB) system. The first meeting, held on January 9 at the National Center for the Blind in Baltimore, brought together members of the working group on text navigation features. This group was charged with developing a comprehensive list of features that would be required by the most advanced digital talking-book user reading the most complex DTB. Their draft report described nearly fifty features, including such capabilities as moving through the book a word, sentence, paragraph, or page at a time; jumping directly from the table of contents or index to an item listed there; placing bookmarks at important points throughout the document to which a user can quickly return; and searching for a specified word or phrase.

On January 14, 1998, the working group on production guidelines met at NLS in Washington. Basing their discussions on the draft report from the working group on text navigation features, members looked at each of the features described and analyzed its impact on the production of a DTB. Approximately half the features listed would require some special intervention during production and half could be implemented solely through functions built into the playback device. The working group drafted a report of its discussions, annotating the report of the navigation features group with production-related commentary.

On January 15, 1998, the third working group met at NLS to address issues related to the file specification. This group also followed the outline of the report from the working group on navigation features, assessing how the file specification should be structured in order to implement each feature listed. The general consensus of the group was that the file structure of a DTB would consist of three major parts: an audio file, which could be encoded in any of several standard audio codecs (compression/ decompression algorithms that allow enormous audio files to be greatly compressed); a text file (necessary for word spelling and text searches) with tags from a descriptive markup language inserted probably HTML 4.0 (Hypertext Markup Language); and a linking file that synchronizes the audio and text files, probably written in SMIL (Synchronized Multimedia Integration Language). Discussion revealed that HTML alone did not have sufficient elements to handle all the complexities of digital talking books and that additional tools such as Cascading Style Sheets or XML (Extensible Markup Language) would be needed.

In addition to the three major components just described, the group also recognized the need for a "Book Information File" that would hold summary information about the book and a "Navigation Center" that would include every significant text element in the book, from dust jacket information, copyright statement, and foreword, through chapters, sections, and subsections, whether or not these items were listed in the table of contents. The navigation center, as its name implies, would be the primary tool used by readers to move through a document.

After individually reviewing the working groups' reports, the full committee met on March 15 and 16, 1998, in Los Angeles. Members first discussed the report on navigation features. A key area of focus was the relationship between the table of contents found in the print document and the "navigation center" discussed above. Members were concerned that if both included a full range of navigation features and options, yet were different (the navigation center would normally contain much more detail than the print table of contents), users would be confused by the differences. The group recommended that the print table of contents contain no navigation options other than hypertext links to the items listed, while the navigation center would be very flexible and could be accessed in a wide variety of ways. The committee asked that the working group incorporate the committee's recommended changes and then make the document widely available to consumers and other interested parties for comment.

Three members gave demonstrations of DTB hardware and software prototypes. Gilles Pepin of VisuAide, Inc., demonstrated "Victor," a CD-ROM-based DTB player that incorporates nearly all of the functions listed in the navigation features report. He also showed the group a PC- based "time-scale modification program" that altered the speed of a portion of narration over a wide range without changing the pitch of the narrator's voice. Dennis DeVendra of Recording for the Blind and Dyslexic played a prototype DTB on a laptop computer, illustrating how a user could move instantly from one part of a DTB to another and how the text and audio segments were synchronized at the sentence level, so that a sentence would be highlighted on the screen while the audio portion played. He explained that the two pieces of the DTB had been synchronized by a special program that automatically matched a sentence in the text file with the corresponding sentence in the audio file. Automatic word- level synchronization also appears quite feasible. Mark Hakkinen of The Productivity Works, Inc., also demonstrated a digital talking book, this one produced using SMIL to link the text and audio files. All three demonstrations revealed the great potential offered by digital talking books for ease of use, enhanced access to information, and fast, flexible navigation through a document.

Mark Hakkinen, chair of the file specification working group, led a discussion on developments in each of the three areas of the file structure: text, audio, and linking.


The committee continued the discussion begun in the January file specification meeting regarding the choice of markup language for the text file. The strengths of HTML 4.0 are that it is widely used and understood and that many authoring tools are currently available for it. However, it would require the use of supplementary tools to handle all of the complexities of a DTB. In contrast, XML appears to have the tools and flexibility needed to implement DTBs, but is very new and thus little known. It was agreed that XML represented the more promising approach but that further evolution of markup languages like XML and HTML was inevitable. Rather than endorse one at this point in the standards development process, the committee chose to focus instead on the underlying requirements of text markup. It was agreed that the standard would identify all of the semantic elements to be used in digital talking books and would also include a sample "Document Type Definition" (DTD) for XML. The DTD shows how the elements defined in the standard are identified and used in the specific markup language.

A new working group was created to develop the markup- language specification. Called the MarkUp Specification Team ("MUST"), it will be chaired by George Kerscher of the DAISY Consortium and include ten other committee members.


Lloyd Rasmussen of NLS presented an overview of the current status of audio codecs. He described the different approaches the codecs take, each with its own strengths and weaknesses. His conclusion was that any of several widely used codecs can provide high-quality sound and significant savings in storage requirements.


The Synchronized Multimedia Integration Language (SMIL) continues to progress toward standard status. Since January, major software firms interested in utilizing SMIL performed an interoperability test, successfully demonstrating test versions on each others' playback systems. It is expected to receive "recommendation" status (equivalent in this arena to an approved standard) by May 1998. Several test DTB fragments have been developed and it was recommended that committee members' organizations do sample implementations of their own to test SMIL further.

Michael Gosse of the National Federation of the Blind presented a proposal for synchronizing the text and audio files at the word level, using a separate binary file to indicate the precise time at which each word begins in the audio track. When a user identifies a specific word in the audio file, the playback device would calculate its position (e.g., tenth word in the paragraph) and locate the word at the same position in the corresponding paragraph in the text file. The committee discussed several mechanisms for limiting the size of the binary file so that it would not significantly impact the overall size of the DTB.


During a discussion of copyright issues, Mary- Frances Laughton of Industry Canada reported that the Association of Canadian Publishers (ACP) has been meeting with Canadian producers of alternate format materials to discuss methods of ensuring that only eligible populations have access to such materials. The ACP initiated the project but was impressed with the commitment of alternate format producers, as demonstrated by current practice, to limiting access to copyrighted materials. Laughton will distribute the report to committee members when it is released in May.

Action Planned

The first working group will incorporate committee recommendations into its navigation features document and distribute it widely for comment. The file specification working group will track and report on developments in SMIL and markup languages. The recently formed fourth working group will meet at least once to begin development of the markup-language specification. The full committee will meet next in October.

Participating Agencies

American Council of the Blind
American Foundation for the Blind
American Printing House for the Blind
Association for Education and Rehabilitation of the Blind and Visually Impaired
Association of Specialized and Cooperative Library Agencies
Blinded Veterans Association
Canadian National Institute for the Blind
The Hadley School for the Blind
Industry Canada, Assistive Devices Industry Office
LaBarge Electronics
NCR Corporation
National Federation of the Blind
National Institute of Standards and Technology
National Information Standards Organization
National Library Service for the Blind and Physically Handicapped
The Productivity Works, Inc.
Recording for the Blind and Dyslexic
Telex Communications, Inc.
TRACE Research and Development Center, University of Wisconsin
VisuAide, Inc.
World Blind Union

Robert E. Fistick, Head, Publications and Media Section
Vicki Fitzpatrick, Senior Writer-Editor, Publications and Media Section
George Thuronyi, Writer-Editor, Publications and Media Section

