Good afternoon. Michael Kaplan asked me to speak a little about the aggregator challenge in regards to bibliographic control of electronic resources. I will cover that briefly, but I also want to add a few other comments about the more general topic of this conference - from the view of a librarian working for a vendor, but with the emphasis on "librarian". When I speak about vendors, it will be in the more generic sense, including ILS and other vendors as well. I will also touch on the role of our publisher colleagues.
I'd like to start by briefly reviewing a few underlying assumptions for my comments, many of which we have heard at this Conference. Unquestionably, new business models for acquisition and access will require different levels and sources of cataloging, or metadata. Articles, websites, pre-print services - all of these will need controlled, consistent integrated access. Libraries cannot do it all and must select where to focus their resources. Practical, scalable solutions need to be found, such as those outlined by Regina Reynolds in her excellent paper. Michael Kaplan is on target with his concept of a core bibliographic record enriched with contributed data from a number of sources, including publishers and vendors. And there is no question that libraries can, and must, benefit from appropriate partnerships, with publishers, vendors, search engines, Amazon.com, and many other commercial enterprises.
So what are the specific issues surrounding aggregators and bibliographic control? Aggregations have proven to be a cost-effective method of providing widespread access to the full text of e-journals. Although there has also been widespread concern about bundling together a large number of titles that haven't been "selected" in the traditional sense, some recent studies, including one at OhioLink, are showing that the titles that were not previously selected for a library's collection are receiving as much use, if not more, than the previously selected titles. So aggregations are very much here to stay.
The nature and size of aggregations can vary widely, in content, coverage and business model, so that it is difficult to deal with a uniform set of processes or standards for bibliographic control of these collections. One title can be part of multiple aggregations, which only exacerbates the multiple version problem. Titles move in and out of aggregations, often without notice to the aggregators themselves. Libraries want to integrate access to titles in aggregations with their other resources and patrons need to be able to link to the "appropriate copy" of the title to which they have authorized access.
Ideally, records for the individual titles should be created once and used by many; we need records for both the titles and electronic holdings. These records also need to be maintained, updated, deleted, etc. And, regardless of the source of the record, we need standards and some agreed upon level of quality enforcement.
Current methods of handling access to aggregator titles vary widely from library to library. Some offer no access to the individual titles, but simply a record for the aggregation itself in the OPAC - hardly a satisfactory solution. Others offer a web page link to the aggregation. More often, we find a web page link from the e-journal title to the aggregation. Links to the Jake project at Yale are becoming more prevalent - from either the OPAC or webpage, or both. For those libraries offering access to individual e-journal titles from the OPAC, there are sometimes multiple records for each version or aggregation for access. Others use a single record containing holdings for multiple versions. The publisher URL (or a durable URL) is displayed either in the bibliographic portion of the record or with the appropriate holdings information (this latter approach is much clearer to the user and hopefully will become more widespread).
There are also multiple sources of records for aggregated titles currently being used in libraries. Some are locally cataloged and maintained, although it is generally agreed that this is not a scalable solution. There are also cataloged sets from OCLC contributed by participating libraries. Some consortia offer cataloging records for titles held jointly, such as the NESLI group in the UK. Sometimes vendor lists from websites are run through a MARC program on a regular basis and loaded into the ILS. Others use Jake records downloaded in MARC format. Then there are aggregators who are able to offer MARC records, either themselves or by using a commercial MARC service.
What are some of the challenges faced by aggregators in attempting to provide records for their collections? Libraries who purchase one aggregation may not all have access to the same set of titles; often there are different packages within aggregated sets. In the case of RoweCom's Information Quest, for instance, access is provided only to those titles for which the library has a licensed subscription with the publisher. So one solution doesn't fit all. Processing bibliographic records routinely and mechanically for aggregator titles has its pitfalls. Aggregators don't as a rule examine an electronic title to determine if this is simply an electronic version of a print title or if there is in fact new content. Sometimes there is confusion about whether there is actually full text or the site simply contains abstracts.
The aggregator may need to create a MARC record if no print equivalent exists. Higher-level staff and training is often necessary. I would actually support an option for the aggregator to "outsource" the cataloging back to libraries for payment. In this way, both parties would benefit. The process of creating bibliographic records needs to be cost-effective for the aggregator. Be assured that the cost will be passed on, bundled or not. And indeed the library community does need to pay fairly for added value services.
There are other difficulties inherent in relying on the vendor for bibliographic records for their collections. Often the management and priorities at the vendor change and resources are no longer available for the cataloging project. Aggregators need improved monitoring to know when a publisher or title drops off or changes, or the format specifications for issues change. They need a way to easily maintain the data - deletions, holdings, coverage, etc. - and to do it centrally and consistently regardless of which ILS system is involved at a particular site.
Many of you are probably familiar with the work of the PCC Standing Committee on Automation Task Group on Journals in Aggregator Databases, so I won't cover this in depth. Following a CONSER survey which demonstrated that the majority of respondents wanted vendor-supplied cataloging records for electronic titles in aggregator sets available in the OPAC, the Task Group was charged with proposing the content of a vendor-supplied record for an "aggregator analytic" and mounting a demonstration project. They were also to make recommendations for maintenance and updating. A final report was issued in January 2000 and this contained practical solutions for the issue at hand. A new task group has been formed to continue the work, dealing with record sets, e-books, communication, increased work with vendors, and more.
The PCC Task Group determined that the type of record should depend on the number of titles in an aggregator database. For a very small number of records, human-created analytics were best in terms of quality, but not scalable for larger collections. The second-best solution consisted of machine-derived analytics from the print version of the serial and assumes the availability of necessary cataloging records. Beyond 200-300 titles, the machine-derived solution was selected as the best option. Other choices considered were machine-generated analytics which rely on defaults, scripted creation of minimal records, and a single combined coverage index, like Jake. Although some vendor-supplied aggregator records have been available and more are becoming available, their use is still disappointingly minimal. It is hoped that this volume will increase in the future.
It appears that we need to come up with a more central and granular solution for these records. Michael Kaplan is correct in wishing for an EDI-like solution for vendors to update holdings and URL's on a timely and straightforward basis. We need to do this once in a central database and then have the holding libraries notified about changes, or find a standardized way to send automatic updates to all ILS systems to update selected portions of the record. We have been successful in loading and updating EDI invoices; if we can do it and trust the process where money is involved, surely we can come up with consistent match points and a process to update holdings and URL's. We also need to increase the level of granularity of OPAC access; one ILS vendor has shown interest in receiving from us a SICI-like string including a durable URL - to create electronic holdings which can then link to the table of contents at the issue level of an electronic title. I would recommend that we need a SISAC-like group including librarians, ILS vendors, utilities, aggregators and publishers - to find solutions and work together to implement them quickly.
Let's take a moment to look briefly at some issues surrounding publisher responsibility for metadata in the future. For the first time, publishers are realizing the intrinsic value of their metadata in relation to e-commerce applications and will be more interested in solutions which lead to increased sales of their publications. The results of bad data and errors will be more readily apparent and have a negative impact on sales. So we have an important opportunity here, as publishers will need to begin collecting metadata in a more standardized form from their authors. We should actively participate with publishers to ensure that they will be distributing this metadata for titles, articles, chapters and related names and works in a standardized and consistent format.
Publishers should have increased responsibility as well for the quality, accuracy and updating of bibliographic and other metadata. They should be informing us immediately or before the fact about title changes and holdings coverage changes. And publishers should be partnering with libraries and vendors to ensure consistency and quality of their data. Use of library authority files and authority processing would benefit the publisher and the user of all online services. Publishers are also providing increasingly enriched data - tables of contents, resource links, author biographies, issue dispatch data, rights management information, etc. Libraries need to be actively involved in the standards, such as ONIX, which are being developed to deal with these new types of metadata. This is one particular partnership with standard groups and publishers that should immediately be explored.
What are some of the vendor roles in creating and dealing with metadata for electronic resources? Vendors should be creating the umbrella systems for resource discovery, integrating both local and networked resources. And they should be doing this with much library input. They should be developing and applying technological solutions for bibliographic control and record enrichment. Vendors should be actively partnering with library, publisher and other groups/vendors in standards development. They need to be encouraging and publicizing the use of library defined standards by publishers and authors. When appropriate, vendors should work to provide and maintain standardized metadata (cataloging data) for their collections. And they should be providing enriched data and links from the standard bibliographic record.
In preparation for our topical breakout sessions, I wanted to see if there were some lessons that we as librarians could learn from the commercial sector and keep in mind while carrying out this daunting task! This is the list I've come up with:
For the first time, libraries are facing serious competition in their traditional functions and areas of expertise. Unfortunately, the Internet is now the first place that many audiences turn to for research; libraries do not yet have an obvious place on the Net nor are they the first place that the average user now thinks of for information access. It is time to actively work to regain and retain our market share! If we have to borrow some tactics from our "competitors" to be successful, so be it!
The commercial world is now creating partnerships left and right; companies can no longer go it alone. Yesterday's competitor is today's strategic partner.
The commercial sector constantly performs cost/benefit analyses. And we librarians need to do that as well. We have to choose what's important and identify those areas which are less important and where perfection is not in fact necessary. I was struck yesterday by Barbara Tillett's talk on the new possibilities for authority control and thinking that if we could be successful in working with publishers and search engines to implement some of these features, this would be really significant - in my mind, much more significant than worrying about exact transcription or correcting someone else's cataloging copy. Something needs to give - let's concentrate on where we can do the most good and have the most positive impact.
As has been said multiple times in different ways during the last couple of days, libraries need to learn how to better market themselves and their knowledge and skills! We must be assertive and prove our value add in as concrete ways as possible. We have so much expertise in resource evaluation, authority control, cataloging, access! At the Charleston Conference recently, we heard that the new ONIX standards were being developed with little or no involvement from the library community. Sitting in a roomful of librarians, I couldn't believe that no one was angry enough to stand up and ask why... We must demand to be heard and to be involved!
We need to take risks and experiment- it's not a matter of life and death. If a project doesn't work out well, so be it! Some will work and we'll be the better for it! We no longer have the luxury of planning out every last detail and ensuring that an idea won't fail along the way... And we need to provide or secure funding for these experiments, as Jane Greenberg has said earlier today.
Forecasting - this is a hard one - we need to try to predict the future and to look ahead as much as we look at current problems. We need to anticipate future challenges and design our solutions to be flexible enough to meet future needs. Good luck to us all!
Time To Market!
And, finally, we need to be concerned about time to market! The world won't wait. And we don't want to be bypassed because we can't make quick decisions or because we're perceived as bogging down a process.
The library community needs to formulate solid, practical, immediate action plans which include partnerships with the commercial sector, in order to deal with the many challenges facing us and put them into motion. Thank you.
December 19, 2000
Library of Congress Help Desk