Catalog Enrichment Services
Syndetic Solutions, Inc.

Jeff Calcagno, Director of Sales & Customer Support
Library of Congress Conference
on Bibliographic Control in the New Millennium
16 November 2000

Final version

I want to first acknowledge that many of the bibliographic enrichment data elements which I will be discussing are not new to the library community and library users. For some time it has been well established that the use of tables of contents, summaries, annotations, analytical notes, etc., are a valuable addition to the library catalog and the library users' information-seeking experience. Many individuals at this conference have done extensive research, dating back over twenty years, clearly demonstrating the usefulness of this data.

I should also go on record to state that libraries have correctly responded to these needs by establishing standards, both under cataloging rules and in the MARC format, to make many of these data elements available for their patrons' use, whenever possible. There is no group more qualified to create such information. Of course, finding the time and financial resources to create the data locally has proved to be increasingly difficult.

"Raised Expectations"

Though increasingly costly to produce at the local level, Syndetic's believes bibliographic enrichment data will play an important role in the future development of the library OPAC, library web sites, and what is often now being called the "library portal". If we anticipate leisure reading to continue to increase, and as a large legion of life-long learners march into "retirement", coming to the library, either physically or "virtually", to find a good book is going to take on a much more complex meaning. And, of course, researchers, both non-professionals and those in academia, are also gleaning much more information from sources formerly inaccessible prior to the Web. They now utilize an astonishing array of highly structured abstracting and indexing files and full-text databases unavailable until a few short years ago.

I should also hope none of us are too surprised if a whole new group of users begin to discover all that the library can provide. Many of them are "lining up" now, if you will, at the local online bookstore, and they are experiencing a plethora of information about books in the form of cover images, summaries, annotations, tables of contents, reviews, author interviews and biographies and, of course, the ubiquitous reader reviews (everybody has an opinion!). While the approach to developing these types of enhanced bibliographic databases has appeared to be a "more is better" approach, Syndetics believes that the library catalog is a much different access tool that will require a more careful and discerning approach to fulfilling their users information needs. But it is certain that libraries must begin using new and creative methods for bridging access to their collections in a more comprehensive manner. I also should emphasize that the technology and production capacity is now available to begin bridging these "bibliographic gaps" in a timely manner. It is clearly time for libraries to satisfy the raised expectations of their users.

Enrichment data benefits OPAC users in several ways:

In addition to Syndetics own enrichment creation efforts, large quantities of useful enrichment data are also now available from thousands of sources, including publishers, book wholesalers, review sources, and others. Not surprisingly, many of these data elements are available in a multitude of electronic formats and editions which is another issue that appears to be a hot topic at this Conference. We will certainly be interested in any developments that take place here in this regard.

Syndetic Solutions

Syndetics was founded by a group of librarians and library researchers over two years ago to provide a single source for a wide range of bibliographic information to enhance library public access catalogs. To this end we have established relationships with publishers, book wholesalers, and review sources to make this information available to libraries and booksellers. By becoming a reliable aggregation source, we are also developing relationships within the library community to incorporate enrichment data into library catalogs.

Enrichment Data

What data is available? Syndetic's set of databases include over 1.5 million separate bibliographic enrichment data elements and it is growing at the rate of approximately 5,000 data elements each week or 250,000 elements each year. And much more is to come.

Tables of contents, summaries and annotations are certainly available.

Syndetics can also provide enhanced fiction descriptors to provide readers with considerable precision in finding works of fiction and biographies. This includes precise genre and sub-genre headings, character names and their personal attributes (e.g., gender, ethnicity, occupations, etc.), geographic settings, series and award information, all fully searchable in the library catalog. Author notes and lists of contributors, which provide useful information about an author's educational background and institutional affiliation, can also be supplied for many scholarly titles.

And we also have available a large number of book reviews and first chapters from a number of sources that cover both trade and scholarly materials.

We also offer cover images, book jackets, cover art or whatever you want to call them. They are often visually pleasing to the eye and add graphics to an otherwise text-filled screen. For some of us, they may even have a useful access feature. In fact, we are now working on a program to create keyword descriptors of cover images as searchable data elements. So you may still yet find that cookbook with the green and yellow cover! Syndetic's presently offers three different sizes of cover images, from "thumbnail" to large, and we are now working with libraries to include them in their catalogs. The library catalog will certainly never look the same.

Enrichment Data Attributes

Scope

Syndetics intended coverage includes English-language monographs currently in-print, and we make every attempt to gather as many enrichment data elements as possible about each title. The majority of our enrichment data covers titles published since 1985, however, Syndetics also manages several retrospective projects that yield enrichment data for many out-of-print titles. We are also now beginning to expand our coverage to include non-English language titles including French, German, and Spanish.

Because Syndetic's receives information from a large number of data providers, consolidating and standardizing data formats is a critical component of our services. Information is delivered to Syndetic's in a variety of formats, including MARC, ASCII (text and delimited), HTML, XML, and XML variants and many proprietary formats.

Timeliness

Giving libraries the advantages of a single source for aggregating enrichment data, without also providing timely availability, will often make the data less useful to users. Whether data is received in electronic or print form for conversion, editing and distribution, it is important that this process takes place in a timely manner. Because many libraries order books prior to publication, much of their cataloging data requires enrichment shortly after the book has been ordered. Most information Syndetic's receives is on a set schedule from our providers and conversion work is often accomplished within several hours; editing work is often accomplished within 24-48 hours. This is an area where libraries may wish to carefully examine performance benchmarks from enrichment data suppliers.

Relevance

Accurate and precise enrichment data improves search access. Tables of contents, summaries, enhanced fiction descriptors and, in the near future, indexes and chapter-level bibliographies, are rich in useful keywords. But not all enrichment data is appropriate for all libraries. As an aggregator of catalog enrichment data, Syndetics continuously evaluates and implements editing procedures that retain useful and consistent enrichment data for libraries, carefully considering the costs versus benefits. We welcome input from libraries and their users on the appropriateness of various enrichment data elements and hope additional exposure to enrichment data of all types will lead the library community to some general consensus allowing Syndetics to train our focus accordingly.

Development

Through our own research, and through discussions with librarians, Syndetics has been working to identify other enrichment data elements that we plan to make available to library users in the future. Let me quickly note four of the most significant programs.

Indexes - This information is specifically noted in Michael's paper and they are certainly worth discussing. Syndetic's has been working on an index conversion project for some time. A concern for us is how or even whether to attempt a consistent format. There are also issues related to the cross-reference structures contained in many indexes and the inclusion of author names in indexes. Both of these authority control concerns are giving us pause to think carefully about what library users will demand. Finally, the sheer size of most indexes will incur considerable conversion and editing costs. We do envision use of machine-readable indexes for selected works in the near future and we have started working with libraries to determine what types of materials will benefit most from having searchable indexes included in their catalogs and in what format. Once library test partners have been identified, we will begin a pilot project.

"Suggested Readings" & bibliographies - Having search access to this type of information will provide one additional research tool for scholars looking for related research work or attempting to locate works by a given researcher. It will also make "browsing" the catalog that much more productive. Syndetics has completed format definitions for the standard data elements; they will be fully parsed and standardized so they can be hyper-linked to the related titles and authors. Syndetic will begin a pilot project in 2001 to initiate the creation of approximately 10,000 bibliographies over a six-month period which will be made available to libraries that wish to participate.

List of tables, illustrations, graphs, etc. - It is clear to us that this type of data, which is often available along with the table of contents, will provide useful access and descriptive information. The critical task at this time is working with libraries and local system vendors to address display and indexing issues for this data.

Author Profiles - This is a what we are calling an authority record "hybrid". The objective being to allow searching on specific kinds of authors with regional affiliations. It will contain such information as place of birth, current residence, areas of genre or subject expertise, ethnicity or cultural background, occupation, institutional affiliation, awards or honors, etc. Most early testing will involve booksellers, however, we believe libraries will also find a place for expanded author information in their catalogs.

Distribution & Access

Syndetics provides enrichment data directly to libraries and through marketing arrangements with suppliers of bibliographic services, local system vendors, and providers of web-based search software. The continuing growth and development of these arrangements is critical in allowing enrichment data to come into common use and to promote unique and creative uses of these data both in indexing and display among the many OPAC vendors and other possible outlets.

Libraries that receive enrichment data directly from Syndetics have complete control over determining exactly what types of enrichment data they wish to utilize (e.g., only tables of contents, cover images and reviews), in what format the data should be placed (e.g., MARC fields, HTML, XML, etc.), whether enrichment should be performed retrospectively or only on new titles, whether it should be limited to subsets of the collection (e.g., only juvenile materials) and how often enrichment should occur (e.g., weekly, monthly, quarterly, etc.).

While Syndetics is now providing enrichment data for several different types of library systems, mainly through MARC record enrichment, it does appear that we are moving into a transition period. The traditional "vessel" for holding such data for libraries, the MARC format, is demonstrating that its original purpose, as a well-structured bibliographic communications format, does not appear to be the best place for most, if not all, enrichment data.

Record and field size limitations, though not an issue for Syndetics, and probably not even the MARC format itself, are certainly issues with local system vendors and bibliographic utilities. Screen display concerns for viewing a bibliographic citation with enrichment data are an even bigger issue because library users can be faced with the display of a "never-ending" record that contains more data elements than even the most patient users wish to view. As a result, Syndetics is now working with libraries and local system vendors to make these data elements accessible remotely from separate enrichment files which can be linked to a library's bibliographic record and displayed on a "as requested" basis. Presently, two approaches have been identified and put into practice.

Linking field embedded in the MARC record (e.g., 856)

Placing linking fields in MARC records for enrichment is easily accomplished though some local systems presently have constraints on how various "buttons" will allow for displaying enrichment data from a linked file. While effective for viewing enrichment data, this approach also appears to have the disadvantage of not allowing the enrichment data to be searched in many catalogs. Most would agree that this is a serious drawback for many enrichment data elements, particularly tables of contents, annotations, author notes, bibliographies, and indexes. One remedy is to place some of the enrichment data in the MARC record and simply not display it (which most local systems can do) but this is certainly not an elegant solution. This approach appears to us to be a 'transition solution" that will phase out as software advances occur. The second approach portends this coming.

"Umbrella search" of the OPAC and Enrichment Files

"Umbrella search" software is now available which not only will search across multiple electronic files, but will locate, combine and display basic bibliographic information with corresponding enrichment data. Libraries implementing this approach eliminate the need for manipulation of the local catalog record by Syndetics or the library. This allows the catalog record to be, as Michael notes in his paper, the "center of the bibliographic galaxy" for the library while the enrichment data forms various "constellations". This approach also means that Syndetics is able to focus its efforts solely on the process of managing and continuously updating the enrichment files rather than continuously enriching many thousands of library catalogs. This is the more practical approach to making enrichment data available.

We also believe that the utilization of such software is particularly valuable as libraries seek to further refine the user's search experience for both printed and electronic information. This extends from the support of user search profiles ("My Library" concept) to the use of enrichment data to facilitate automated notification of related titles of interest. By utilizing this data in such a manner, we believe that the "tailored" library catalog will become much more of a reality. The ability to access such data remotely in HTML or XML formats will also allow libraries to customize displays through the use of library-defined style sheets (library "branding"?).

Issues for Discussion

While libraries grapple with the considerable task of providing bibliographic control over the ever-expanding galaxy of information found on the Web, they should not lose site of their own Milky Way, the local OPAC, and the collections contained therein which many libraries have spent decades or even centuries building. As we expand both the amount and scope of available enrichment data, Syndetics continues to seek feedback from libraries and library researchers on enrichment usage. In particular:

Certainly, serious discussions, with the objective of establishing specific guidelines or standards, will assist vendors in responding to the needs of libraries in this regard. This is one of the reasons Syndetic's is pleased to be participating in this important conference.

Syndetics does believe that libraries should aspire to and demand well-crafted, complete and timely bibliographic enrichment data. The attributes of the data must reflect these demands and be integrated in such a manner as to respect the considerable efforts that have been put forth by cataloging staff to maintain the integrity of the library catalog as an access tool for their users. While the online bookstore is often pointed to as a model for libraries to follow when considering the addition of bibliographic enrichment data, the comparisons end quickly when issues such as authority control and the dilution of search relevancy are closely examined.

Providers of enrichment data should bring to the task a considerable amount of experience in handling enrichment data and managing bibliographic files, being particularly aware of, and sensitive to, the many issues related to catalog maintenance which can sometimes be a source of conflict between the technical services and public services staff. However, we also hope that vendors and libraries are willing to experiment with enrichment data in unique and creative ways to help make the library catalog or "library portal" a more dynamic and effective information-seeking tool for their users. Syndetics welcomes the opportunity to assist libraries in this regard by working collaboratively with them along with content providers, international standards organizations, our marketing partners, and local system vendors in meeting their users' demands for such information. We are certain library users, both now and in the future, will demand nothing less.


Library of Congress
December 19, 2000
Library of Congress Help Desk