April 20, 2001
Membership: Diane Boehr (NLM), Robert Bremer (OCLC), James Castrataro (Indiana), Tad Downing (GPO), Ed Glazier (RLG), Beth Guay (Maryland), Ruth Haas (Harvard), Lynne Howarth (IFLA Review Group liaison), Wayne Jones (MIT, interim chair Oct-Dec 2000), Judy Knop (ATLA, liaison to CC:DA major/minor task force), George Prager (Brooklyn Law School), Regina Reynolds (LC/ISSN), John Riemer (UCLA, chair), Cecilia Sercan (Cornell)
I. Summary/Introduction--What's at stake
The more that our cataloging policy calls for splitting e-resource versions onto separate bibliographic records,
- the greater the cost to copy-cataloging operations in sifting through a proliferating array of bibliographic records to find the "correct" one to work with and
- the greater the corresponding multiplicity of records that OPAC users must deal with.
The more that our cataloging policy calls for lumping e-resource versions onto a single bibliographic record,
- the greater the complexity involved in identifying and using portions of those records relating to a single version, and
- the greater the complexity involved in identifying and removing data that is no longer current.
II. First charge
"Identifying the most common types of versions and reproductions for textual resources currently available and the most bibliographically significant characteristics of each."
We identified a typology of the most commonly occurring differences among manifestations of e-resources, for which cataloging guidance is needed on the number of records required to accommodate those differences.
- What to do when the only difference is the setting (individual institution or consortium) in which the resource is available, e.g. the GALILEO and the OhioLink representation of the same provider's resource.
- What to do when the only difference is the provider of the resource.
- What to do when the difference lies among the particular file formats, e.g. ascii, html, pdf, etc.
- What to do when the resource provider also is responsible for digitizing the material.
- What to do when a given resource provider contributes special added value, such as indexing across time or across multiple titles simultaneously.
- What to do when the difference is in access restriction, e.g. paid-for versus freely available or available on campus only versus off campus also.
- What to do when the difference is in "holdings," i.e. the range of material available.
- What to do when the difference is in "content," e.g. limited full text or text without images or text + images, etc.
- What to do when the difference is in packaging, e.g. one version is part of a rather large aggregation and another version is not.
- What to do when (equal amounts of) the material is available in multiple languages from what amounts to a single source.
- What to do when the difference is in the title the versions have.
- What to do when the difference lies in the formatting of the data, i.e. discrete serial issues in one version and amalgamation of text into a database in another.
- What to do when standard identifiers differ, e.g. URLs.
III. Second charge
"Recommending best practices for each, taking into account both cataloging staff and workload levels. Consideration should be given to:
- Methods of description that provide for clarity, efficiency, and low-maintenance
- Differing needs of shared and local catalogs
- The advisability of the single record technique (noting the existence of the electronic on the record for the print) and its applicability to monographic resources."
Obviously, one manifestation is not going to be identical to another, for, if it were, it would be the same thing rather than a "mere" manifestation. The challenge is in making guidelines about how much has to added to or changed within a resource for it to become what one might term a different manifestation.
Under current cataloging rules, if we regard the provider as a publisher, then we would regard each new provider as a new manifestation; however, a provider seems to task force members most often to resemble a distributor. If such a provider were also responsible for having digitized the resource, it would strike catalog users as rather odd to single out one particular provider for a separate bibliographic record and to consolidate all other providers on another record. Who is doing the digitizing can be significant, when one considers that a digitizer who is also an original publisher of a print version is increasingly able to add sound or image files and/or additional articles/data files that would not fit in the print version.
The cases where task force discussion was closely divided were content differences (#8) and the presence of (parallel) multilingual content (#10). The degree to which e-version content can vary ranges from provision of multi-volume/cross-title indexing to subset relationships (some articles/images present in one version but not in another) to completely different content between versions. The most significant of the factors in section II are content differences, and the greater the degree of content difference the more separate bibliographic records seem warranted. Separate records are more warranted when multilingual versions available from a single provider are not all freely available or offered in the same package, but instead require separate subscriptions.
In its discussions to date, for each of the other cases, a majority of the task force is inclined to consolidate multiple e-versions on a single record.
The greater the variety of sources for e-version records (commercial and noncommercial), the more support and interest there is among task force members for separate records, for the ease of removing sets of records from the catalog.
The Task Force is aware of the interim CONSER policy issued on October 25, 2000, which allows for noting the existence of multiple e-versions on a single print record but calls for separate records describing e-versions singly on separate bibliographic records. If later on it is easier to collapse data than to separate it, thought will need to be given to which fields in the separate bibliographic records will facilitate the eventual linking up or consolidation of data.
There are different and conflicting needs between local catalogs and shared catalogs. A local catalog is intended to contain descriptions of those manifestations to which the local population has access. From the end-user perspective, it is probably clearest if the local catalog contains as few records as possible for the same bibliographic entity, regardless of manifestation. That is why in many cases, it is most helpful to describe the various manifestations in a single record. If this is done, it must be clear to local staff and local users what the locally available manifestations are and how to access them.
Records in a shared database, like a regional union catalog or bibliographic utility have many functions: a source for shared cataloging copy, acquisitions data, interlibrary loan, and reference and "discovery" both for staff and end-users. In a master record database like OCLC, if a single record is used for multiple manifestations at all the holding institutions, shared cataloging requires detailed review and editing if the local institution does not own or have access to all the manifestations represented in the record. If records for consolidated e-versions were to became commonplace, thought would have to given to how ILL, acquisitions, or reference staff would clearly identify a version they were interested in borrowing, ordering, or having access to. In the RLIN/Eureka context, clustering records together becomes more complex when some descriptions describe one manifestation and some describe another.
A contributing factor to the complexity may well be the partial integration of holdings and location characteristics ("362 1 Coverage as of ..." and contents of 856 fields) into bibliographic records. If users learned they could depend on finding all manifestations clustered in a set of records, they'll be more likely to take the time to look for and examine those individuals in the set. This is the collocating function of the catalog! In choosing to distinguish the remote electronic secondary manifestations, we may be hindering that function. We have to find ways to provide this service to our users.
The paper Michael Kaplan recently presented to the Library of Congress' Conference on Bibliographic Control in the New Millennium inspires hope that we might, with the aid of technology, harmonize the public service need for unity of display among related e-versions with the expediency of behind-the-scenes technical processing involving discrete records.
IV. Third charge
"Defining principles by which determinations about whether to create single or separate records for versions and reproductions can be made in order to facilitate future decisions when new types of version are published (e.g., Palm Pilot editions)."
Some possible criteria on which to base decisions include:
- Cost to the cataloging process of sorting through multiple related records for the one to use; parallel burdens to the OPAC user in sorting through multiple hits, and to the union catalog user determining what information in a record to ignore.
- The highly changeable nature of electronic resources (more volatile, ephemeral than print serials)
- Content versus carrier the relative importance of each.
- Discernible, significant differences in resource content
- Whether an item can be considered a "separate publication"
- What the user is likely to construe as a single entity, regardless of individual content differences
- The relative ease of consolidating versus separating records/data after-the-fact (is it easier to delete irrelevant data?)
- Clarity of data content within a single bibliographic record that attempts to describe multiple e-version (can one tell easily what is being said about any given provider?)
- The threshold for a needing a separate record in other environments, e.g. ISSN assignment
(In current ISSN practice, when rights to the primary product are licensed or sold to another publisher, the presumption is that this is a secondary manifestation, which does not warrant a separate ISSN assignment. The possibility for content difference in this situation is considered much lower than in the case of a publisher who offers both a print and a digital version; in the latter situation the two versions are regarded as different editions or separate products.)
- Degree of overall involvement and/or intellectual responsibility a provider has in a resource---offering access to it (not much), digitization/publication (some), provision of added value such as cross-publication searching and auxiliary background material (a lot)
- When, if at all, monographic resources warrant separate solutions from those we might recommend for serials/continuing resources
- Whether aggregated resources are a special situation, possibly warranting a different treatment.
We solicited the ideas of people in the bibliographic community by posting messages in fall 2000 about our task force's work to various online discussion lists: [SERIALST, DIG_REF, AUTOCAT, OLAC, DIG-LIB, CORC-L]. Readers of this report with additional ideas and suggestions are encouraged to submit them to any task force member listed at: http://www.loc.gov/catdir/pcc/tgmuler.html.
A short article about our task force appeared in the November 2000 issue of D-Lib Magazine, contributed by Wayne Jones. At least one task force member has been asked to talk about the group's work at the next ALA Annual meeting.
The task force's home page is located at: http://www.loc.gov/catdir/pcc/tgmuler.html
VI. Further Discussion and Possibilities for Future Action
An interim (CONSER) policy exists and further discussion of the single- and separate-record approaches in this group reveals insufficient consensus to support advocating a different course. In a similar manner, the Feb. 26th, 2001 report of CC:DA Task Force on an Appendix of Major and Minor Changes (http://www.ala.org/alcts/organization/ccs/ccda/tf-appx4.doc) seems disposed toward the creation of separate records in a large number of cases.
The emergence of aggregators* accounts for a high percentage of the multiple manifestations that catalogers confront. When it comes to discerning differences in the various kinds of aggregations, the most significant one in the eyes of the SCA Aggregators task group was the quantity of titles covered. If there are fewer than 200-300 titles, a library could realistically keep up with the cataloging; if greater than 1000 titles, it was critical to get the vendor's cooperation. So far, the larger ones have turned out to be the more volatile, and only the provider is in a position to keep up with added/dropped titles and the latest volume coverage available for the retained titles.
Another meaningful distinction exists between publisher-based or special project-based aggregations versus those which involve the licensing of copyrighted materials. The former seldom if ever drop titles. The electronic offerings of OCLC WorldCat Collection Sets (http://www.stats.oclc.org/cgi-bin/db2www/wcs/wcs_cols.d2w/Electronic) appear to represent this more stable type of aggregation exclusively.
In view of the interest among libraries in minimizing duplication of records, there are a number of potential solutions:
- Dedup and add URLs to bib records for hard-copy versions (California State University, Northridge did this for Ebsco aggregator records and now thinks better of that decision.)
- Dedup and add URLs to holdings records for hard copy versions (University of Washington is doing this in its Innovative OPAC.)
- Collect the monthly and weekly updates from all sources and then dedup and consolidate them locally, just prior to loading them to the OPAC. (This way the proliferation of additional records for the same title is held to a mere single record representing all e-versions. No example institutions currently employing this tactic are known.)
- For all aggregations cataloged by the library community, consolidate all the coverage onto a single record in the bibliographic utility. (If a utility uses the master record model, participants in cooperative cataloging projects would have to be equipped with relatively powerful authorizations, the kind which can lock and replace records.)
In the meantime, at a number of ALA Midwinter meetings great interest was displayed in pursuing the idea Michael Kaplan has put forward, i.e. unity of display for e-versions in the public services setting while retaining separate records behind the scenes for technical services purposes. Possible ways to pursue this include examining what potential content in records for multiple e-versions could support a programming effort to successfully identify the members of a "bibliographic family" and thus have a chance to pull together a unified display. Potential strategies are (1) inclusion of some linkage in existing bibliographic records such as 760-787 fields and (2) creation of some kind of meta-record that would explicitly list the control numbers of records for what are considered to be equivalent manifestations. (Judgments would still need to be made about what belongs in the same cluster of records and what properly goes into a separate cluster.)
As presented at the April 2001 Joint Steering Committee meeting, Jennifer Bowen will be leading a task force, which is going to perform experiments on finding new ways to deal with multiple versions. She will be working with representatives from each of the JSC constituents and with OCLC Europe. The members of this PCC task force look forward to the results of the JSC group with great interest.
Compiled by John J. Riemer
The helpful ideas and suggestions of Valerie Bross and Jean Hirons are gratefully acknowledged.
An * aggregator is an E-resource supplier which provides access to full-text collections of e-journals. The user may browse by title or issue; or may search by tables of contents or full text. Some authors limit use of the term "aggregator" to a supplier of e-journals from various publishers; other authors use "aggregator" to refer to any supplier of a collection of e-journals. Examples: Project Muse, JSTOR, Catchword (For use of term, compare: http://toltec.lib.utk.edu/~colldev/annual98.html versus http://www.library.ucsb.edu/istl/00-summer/article2.html)
An aggregator database is a resource that provides access [through a database interface] to journal articles without necessarily providing access to the whole issue of a journal (based on definition given:http://elibrary.unm.edu/Ejournal/index.shtml.) Aggregator databases usually license content from copyright holders, rather than owning all content. This results in volatility of holdings over time. Examples: Academic Universe, Proquest Education Complete, Proquest Research II and ABI/Global Inform. http://www.library.ucsb.edu/istl/00-summer/article2.html.
A database is a collection of logically interrelated data stored together in one or more computerized files, usually created and managed by a database management system (Subject Cataloging Manual H 1520).