PCC Task Force on Multiple Manifestations of Electronic Resources
Final report
April 20, 2001
Membership: Diane Boehr (NLM), Robert Bremer (OCLC), James Castrataro (Indiana),
Tad Downing (GPO), Ed Glazier (RLG), Beth Guay (Maryland), Ruth Haas (Harvard),
Lynne Howarth (IFLA Review Group liaison), Wayne Jones (MIT, interim chair
Oct-Dec 2000), Judy Knop (ATLA, liaison to CC:DA major/minor task force), George
Prager (Brooklyn Law School), Regina Reynolds (LC/ISSN), John Riemer (UCLA,
chair), Cecilia Sercan (Cornell)
I. Summary/Introduction--What's at stake:
The more that our cataloging policy calls for splitting e-resource
versions onto separate bibliographic records,
--the greater the cost to copy-cataloging operations in sifting through a proliferating
array of bibliographic records to find the "correct" one to work with and
--the greater the corresponding multiplicity of records that OPAC users must
deal with.
The more that our cataloging policy calls for lumping e-resource
versions onto a single bibliographic record,
--the greater the complexity involved in identifying and using portions of
those records relating to a single version, and
--the greater the complexity involved in identifying and removing data that
is no longer current.
II. First charge:
"Identifying the most common types of versions and reproductions for textual
resources currently available and the most bibliographically significant characteristics
of each."
We identified a typology of the most commonly occurring differences among
manifestations of e-resources, for which cataloging guidance is needed on the
number of records required to accommodate those differences.
- What to do when the only difference is the setting (individual institution
or consortium) in which the resource is available, e.g. the GALILEO and the
OhioLink representation of the same provider's resource.
- What to do when the only difference is the provider of the resource.
- What to do when the difference lies among the particular file formats,
e.g. ascii, html, pdf, etc.
- What to do when the resource provider also is responsible for digitizing
the material.
- What to do when a given resource provider contributes special added value,
such as indexing across time or across multiple titles simultaneously.
- What to do when the difference is in access restriction, e.g. paid-for
versus freely available or available on campus only versus off campus also.
- What to do when the difference is in "holdings," i.e. the range of material
available.
- What to do when the difference is in "content," e.g. limited full text
or text without images or text + images, etc.
- What to do when the difference is in packaging, e.g. one version is part
of a rather large aggregation and another version is not.
- What to do when (equal amounts of) the material is available in multiple
languages from what amounts to a single source.
- What to do when the difference is in the title the versions have.
- What to do when the difference lies in the formatting of the data, i.e.
discrete serial issues in one version and amalgamation of text into a database
in another.
- What to do when standard identifiers differ, e.g. URLs.
III. Second charge:
"Recommending best practices for each, taking into account both cataloging
staff and workload levels. Consideration should be given to:
- Methods of description that provide for clarity, efficiency, and low-maintenance
- Differing needs of shared and local catalogs
- The advisability of the single record technique (noting the existence
of the electronic on the record for the print) and its applicability to monographic
resources."
Obviously, one manifestation is not going to be identical to another, for,
if it were, it would be the same thing rather than a "mere" manifestation.
The challenge is in making guidelines about how much has to added to or changed
within a resource for it to become what one might term a different manifestation.
Under current cataloging rules, if we regard the provider as a publisher,
then we would regard each new provider as a new manifestation; however, a provider
seems to task force members most often to resemble a distributor. If such a
provider were also responsible for having digitized the resource, it would
strike catalog users as rather odd to single out one particular provider for
a separate bibliographic record and to consolidate all other providers on another
record. Who is doing the digitizing can be significant, when one considers
that a digitizer who is also an original publisher of a print version is increasingly
able to add sound or image files and/or additional articles/data files that
would not fit in the print version.
The cases where task force discussion was closely divided were content differences
(#8) and the presence of (parallel) multilingual content (#10). The degree
to which e-version content can vary ranges from provision of multi-volume/cross-title
indexing to subset relationships (some articles/images present in one version
but not in another) to completely different content between versions. The most
significant of the factors in section II are content differences, and the greater
the degree of content difference the more separate bibliographic records seem
warranted. Separate records are more warranted when multilingual versions available
from a single provider are not all freely available or offered in the same
package, but instead require separate subscriptions.
In its discussions to date, for each of the other cases, a majority of the
task force is inclined to consolidate multiple e-versions on a single record.
The greater the variety of sources for e-version records (commercial and
noncommercial), the more support and interest there is among task force members
for separate records, for the ease of removing sets of records from the catalog.
The Task Force is aware of the interim
CONSER policy issued on October 25, 2000, which allows for noting the
existence of multiple e-versions on a single print record but calls for separate
records describing e-versions singly on separate bibliographic records. If
later on it is easier to collapse data than to separate it, thought will
need to be given to which fields in the separate bibliographic records will
facilitate the eventual linking up or consolidation of data.
There are different and conflicting needs between local catalogs and shared
catalogs. A local catalog is intended to contain descriptions of those manifestations
to which the local population has access. From the end-user perspective, it
is probably clearest if the local catalog contains as few records as possible
for the same bibliographic entity, regardless of manifestation. That is why
in many cases, it is most helpful to describe the various manifestations in
a single record. If this is done, it must be clear to local staff and local
users what the locally available manifestations are and how to access them.
Records in a shared database, like a regional union catalog or bibliographic
utility have many functions: a source for shared cataloging copy, acquisitions
data, interlibrary loan, and reference and "discovery" both for staff and end-users.
In a master record database like OCLC, if a single record is used for multiple
manifestations at all the holding institutions, shared cataloging requires
detailed review and editing if the local institution does not own or have access
to all the manifestations represented in the record. If records for consolidated
e-versions were to became commonplace, thought would have to given to how ILL,
acquisitions, or reference staff would clearly identify a version they were
interested in borrowing, ordering, or having access to. In the RLIN/Eureka
context, clustering records together becomes more complex when some descriptions
describe one manifestation and some describe another.
A contributing factor to the complexity may well be the partial integration
of holdings and location characteristics ("362 1 Coverage as of ..." and contents
of 856 fields) into bibliographic records. If users learned they could depend
on finding all manifestations clustered in a set of records, they'll be more
likely to take the time to look for and examine those individuals in the set.
This is the collocating function of the catalog! In choosing to distinguish
the remote electronic secondary manifestations, we may be hindering that function.
We have to find ways to provide this service to our users.
The paper
Michael Kaplan recently presented to the Library of Congress' Conference
on Bibliographic Control in the New Millennium inspires hope that we
might, with the aid of technology, harmonize the public service need for
unity of display among related e-versions with the expediency of behind-the-scenes
technical processing involving discrete records.
IV. Third charge:
"Defining principles by which determinations about whether to create single
or separate records for versions and reproductions can be made in order to
facilitate future decisions when new types of version are published (e.g.,
Palm Pilot editions)."
Some possible criteria on which to base decisions include:
- Cost to the cataloging process of sorting through multiple related records
for the one to use; parallel burdens to the OPAC user in sorting through
multiple hits, and to the union catalog user determining what information
in a record to ignore.
- The highly changeable nature of electronic resources (more volatile, ephemeral
than print serials)
- Content versus carrier the relative importance of each.
- Discernible, significant differences in resource content
- Whether an item can be considered a "separate publication"
- What the user is likely to construe as a single entity, regardless of
individual content differences
- The relative ease of consolidating versus separating records/data after-the-fact
(is it easier to delete irrelevant data?)
- Clarity of data content within a single bibliographic record that attempts
to describe multiple e-version (can one tell easily what is being said about
any given provider?)
- The threshold for a needing a separate record in other environments, e.g.
ISSN assignment
(In current ISSN practice, when rights to the primary product are licensed
or sold to another publisher, the presumption is that this is a secondary
manifestation, which does not warrant a separate ISSN assignment. The possibility
for content difference in this situation is considered much lower than in
the case of a publisher who offers both a print and a digital version; in
the latter situation the two versions are regarded as different editions
or separate products.)
- Degree of overall involvement and/or intellectual responsibility a provider
has in a resource---offering access to it (not much), digitization/publication
(some), provision of added value such as cross-publication searching and
auxiliary background material (a lot)
- When, if at all, monographic resources warrant separate solutions from
those we might recommend for serials/continuing resources
- Whether aggregated resources are a special situation, possibly warranting
a different treatment.
V. Outreach:
We solicited the ideas of people in the bibliographic community by posting
messages in fall 2000 about our task force's work to various online discussion
lists: [SERIALST, DIG_REF, AUTOCAT, OLAC, DIG-LIB, CORC-L]. Readers of this
report with additional ideas and suggestions are encouraged to submit them
to any task force member listed at: http://www.loc.gov/catdir/pcc/tgmuler.html.
A short
article about our task force appeared in the November 2000 issue of D-Lib
Magazine, contributed by Wayne Jones. At least one task force member
has been asked to talk about the group's work at the next ALA Annual meeting.
The task force's home page is located at: http://www.loc.gov/catdir/pcc/tgmuler.html
VI. Further Discussion and Possibilities for Future Action
An interim (CONSER) policy exists and further discussion of the single- and
separate-record approaches in this group reveals insufficient consensus to
support advocating a different course. In a similar manner, the Feb. 26th,
2001 report of CC:DA Task Force on an Appendix of Major and Minor Changes (http://www.ala.org/alcts/organization/ccs/ccda/tf-appx4.doc)
seems disposed toward the creation of separate records in a large number of
cases.
The emergence of aggregators* accounts
for a high percentage of the multiple manifestations that catalogers confront.
When it comes to discerning differences in the various kinds of aggregations,
the most significant one in the eyes of the SCA Aggregators task group was
the quantity of titles covered. If there are fewer than 200-300 titles, a library
could realistically keep up with the cataloging; if greater than 1000 titles,
it was critical to get the vendor's cooperation. So far, the larger ones have
turned out to be the more volatile, and only the provider is in a position
to keep up with added/dropped titles and the latest volume coverage available
for the retained titles.
Another meaningful distinction exists between publisher-based or special
project-based aggregations versus those which involve the licensing of copyrighted
materials. The former seldom if ever drop titles. The electronic offerings
of OCLC WorldCat Collection Sets (http://www.stats.oclc.org/cgi-bin/db2www/wcs/wcs_cols.d2w/Electronic)
appear to represent this more stable type of aggregation exclusively.
In view of the interest among libraries in minimizing duplication of records,
there are a number of potential solutions:
- Dedup and add URLs to bib records for hard-copy versions (California State
University, Northridge did this for Ebsco aggregator records and now thinks
better of that decision.)
- Dedup and add URLs to holdings records for hard copy versions (University
of Washington is doing this in its Innovative OPAC.)
- Collect the monthly and weekly updates from all sources and then dedup
and consolidate them locally, just prior to loading them to the OPAC. (This
way the proliferation of additional records for the same title is held to
a mere single record representing all e-versions. No example institutions
currently employing this tactic are known.)
- For all aggregations cataloged by the library community, consolidate all
the coverage onto a single record in the bibliographic utility. (If a utility
uses the master record model, participants in cooperative cataloging projects
would have to be equipped with relatively powerful authorizations, the kind
which can lock and replace records.)
The one difference among manifestations that most warranted separate records,
in the eyes of this task force's members, is significant difference in content.
If we could be assured that the licensed content of an aggregated e-resource
manifestation matched that of the original publication, there would be no more
interest in multiple bibliographic records than there is for multiple file formats.
If providers of aggregations resembled jobbers more than republishers, then we
could avoid the proliferation of catalog records. Might the situation be more
of a publishing problem than a cataloging problem? What if the library community
advocated standards for licensing of content, e.g. always include all the text
and all the graphics? In the event this is just not possible, then clear acknowledgement
of the missing content
should be provided in the resource itself.
In the meantime, at a number of ALA Midwinter meetings great interest was
displayed in pursuing the idea Michael Kaplan has put forward, i.e. unity of
display for e-versions in the public services setting while retaining separate
records behind the scenes for technical services purposes. Possible ways to
pursue this include examining what potential content in records for multiple
e-versions could support a programming effort to successfully identify the
members of a "bibliographic family" and thus have a chance to pull together
a unified display. Potential strategies are (1) inclusion of some linkage in
existing bibliographic records such as 760-787 fields and (2) creation of some
kind of meta-record that would explicitly list the control numbers of records
for what are considered to be equivalent manifestations. (Judgments would still
need to be made about what belongs in the same cluster of records and what
properly goes into a separate cluster.)
As presented at the April 2001 Joint Steering Committee meeting, Jennifer
Bowen will be leading a task force, which is going to perform experiments on
finding new ways to deal with multiple versions. She will be working with representatives
from each of the JSC constituents and with OCLC Europe. The members of this
PCC task force look forward to the results of the JSC group with great interest.
Compiled by John J. Riemer
The helpful ideas and suggestions of Valerie Bross and Jean Hirons are gratefully
acknowledged.
An * aggregator is an E-resource supplier
which provides access to full-text collections of e-journals. The user may
browse by title or issue; or may search by tables of contents or full text.
Some authors limit use of the term "aggregator" to a supplier of e-journals
from various publishers; other authors use "aggregator" to refer to any supplier
of a collection of e-journals. Examples: Project Muse, JSTOR, Catchword (For
use of term, compare: http://toltec.lib.utk.edu/~colldev/annual98.html versus http://www.library.ucsb.edu/istl/00-summer/article2.html)
An aggregator database is a resource that provides access
[through a database interface] to journal articles without necessarily providing
access to the whole issue of a journal (based on definition given:http://elibrary.unm.edu/Ejournal/index.shtml.)
Aggregator databases usually license content from copyright holders, rather
than owning all content. This results in volatility of holdings over time.
Examples: Academic Universe, Proquest Education Complete, Proquest Research
II and ABI/Global Inform. http://www.library.ucsb.edu/istl/00-summer/article2.html.
A database is a collection of logically interrelated data
stored together in one or more computerized files, usually created and managed
by a database management system (Subject Cataloging Manual H 1520).
|