skip navigation
  • Ask a LibrarianDigital CollectionsLibrary Catalogs
  •    Options
The Library of Congress > Information Bulletin > January 2006
Information Bulletin
  • Information Bulletin Home
  • Past Issues
  • About the LCIB

Related Resources

  • News from the Library of Congress
  • Events at the Library of Congress
  • Exhibitions at the Library of Congress
  • Wise Guide to loc.gov

Records for the World
Unicode Allows Search in Non-Roman Characters

By GAIL FINEBERG

The Library recently upgraded its integrated library system (ILS) to support Unicode, a standard of character encoding that enables the on-screen display of all scripts and characters of the 80 languages in which the Library catalogs regularly.

"A universal collection requires bibliographic records in all languages. Unicode allows us to move closer to that goal," said Deanna Marcum, associate librarian for Library Services.

Library catalogers have been cataloging in non-roman scripts for the past 20 years, but the Library's integrated library system could only display the romanized text in bibliographic records. Voyager, the software used by the Library's ILS, could not display the non-Roman characters.

"Now, for the first time, users will be able to see the bibliographic data in their own languages," said Ann Della Porta, acting coordinator in the Integrated Library System Program Office.

After two years of testing the new Voyager with Unicode software — the most complex upgrade since the ILS became operational on Aug. 16, 1999 — the Library has successfully converted its database of more than 34 million records to Unicode. The data conversion process took nine days.

"With more than half our titles in the Library's database in languages other than English, we wanted to make quite sure all the data in our bibliographic database were converted properly," said Della Porta.

The Library has more than 14 million bibliographic records, more than 14 million "holdings" records that contain location and call number information and more than 6 million "authority" records that establish standardized names of individuals, corporations, geographic places and subjects. All were converted to Unicode.

Della Porta explained what the implementation of Unicode means for the Library, other libraries, librarians and library patrons.

"As for the Library, we'll be able to deliver linguistically accurate displays to users of our online catalog. Our users will see the scripts and characters that they know and read," she said.

Users, including other libraries as well as individuals, also will be able to search and print in those scripts and characters, she said.

"We'll be able to record the scripts and characters that appear in the materials we're cataloging," she said. "We'll be able to exchange bibliographic data [both import and export] in a standard format used throughout the world."

Della Porta noted that the Library of Congress is the standards-setting agency for libraries, not only in the United States but also around the globe. By providing standards for Unicode conversion in libraries, the Library has enabled other libraries to adapt. "The Library's implementation of that standard will ensure that our products are universally useful," she said.

"We have been looking forward to the Unicode implementation in the ILS for a long time and are appreciative of all those who have worked persistently to make it possible. Having Unicode will greatly facilitate the research of those who use resources written in non-roman scripts," said Carolyn Brown, director for collections and services in Library Services.

Gail Fineberg is editor of The Gazette, the Library's staff newsletter.

Back to January 2006 - Vol 65, No.1

About | Press | Site Map | Contact | Accessibility | Legal | USA.gov