Standing Committee on Automation
Task Group on Automated Classification
Final Report
December 29, 2000,
approved by the SCA, January 2001
Task Group members: Kyle Banerjee, Oregon State University; Matthew Beacom,
Yale University; Martin Kurth, Cornell University; Louise Ratliff, University
of California, Los Angeles; Gary L. Strawn, Northwestern University, Chair;
Diane Vizine-Goetz, OCLC, Inc.; David Williamson, Library of Congress
Summary
Call numbers are the single most important device for the arrangement of
library materials. Classification numbers are one of the primary tools for
providing subject access to collections, and are one means by which access
can be provided to networked resources. Although many areas of library operations
have been automated successfully, automation has yet to have the effect on
the assignment of classification and call numbers it has had on other aspects
of the cataloging process. Because work with classification and call numbers
remains largely a manual operation, it continues to constitute a significant
part of the cost of cataloging.
Substantial opportunities exist for library software vendors to enhance their
products to assist with the generation of classification and call numbers.
Some of these opportunities call for comparatively small changes to existing
modules, some call for the integration of existing features into a more seamless
whole, and some call for extensive new development, perhaps in cooperation
with other vendors. The following lists identifies the major functional areas
we feel need to be addressed if library software is to provide effective assistance
in the generation of call numbers. The provision of any of these would be a
step forward; the incorporation of all of these into a single tool would be
a major advance in library automation.
- Classification schemes commonly used by libraries should be made available
in the MARC format
- Applications that enable searching and display of machine-readable classification
schemes should not be solely stand-alone programs or services, but should
also provide an interface that allows an external application to query the
classification data to retrieve information in a useful manner, to present
the results of the query to the user and otherwise to interpret and act on
the results of the query
- Library systems should be able to generate the following products from
the catalog against which call number assignment is performed and present
them to the operator: a list of the classification numbers most frequently
associated with a subject heading, a list of the subject headings most frequently
associated with a classification number, and a list of the call numbers assigned
to other editions or versions of the item being cataloged
- Library systems should be able to determine that a call number is unique
in the local catalog
- Library systems should be able to generate complete call numbers for members
of classed-together series, drawing information from the authority record
for the series and the bibliographic record for the analytic
- Library systems should be able to derive complete call numbers for belletristic
works from the classification numbers found in authority records for literary
authors.
- Library systems should assist in the assignment of internal and final
Cutter numbers, drawing on information in a bibliographic record and information
(such as subject headings in other records) present in the catalog against
which call number assignment is performed.
- Library systems should provide some means for identifying local exceptions
to classification practice (based on criteria such as type of material, location,
and classification number), and should provide some means for accommodating
those exceptions
- System call number indexes should be sorted in the proper order
The attached report provides additional background, descriptions, justification,
and a scheme for implementation.
Call numbers are one of the two principal methods by which libraries provide
subject access to materials, and are the primary means for locating items on
shelves. A user who identifies a particular item of interest in the catalog
uses the call number directly to retrieve the item from the shelf; the call
number provides the link between the surrogate in the catalog and the item
described. Classification numbers make manifest the organization of knowledge
embodied in a classification scheme, thereby bringing together materials on
the same and related topics. A user with a general information need can find
one or more likely classification numbers in the catalog and use them to browse
the shelves for useful items.
The value of classification is not limited to the physical items controlled
directly by an individual library, but has been extended to networked electronic
resources. Classification numbers can be assigned to surrogates for Web resources
(either maintained separately in a database, or embedded in the resources as
part of the metadata) to enable more focused searching than is possible with
mere keyword retrieval; and classification numbers can be used to organize
the results of a search for networked resources, providing a more efficient
display for the user.[1]
Nearly every facet of library technical services has been redesigned to take
advantage of the benefits afforded by automation. The use of library utilities
such as OCLC has reduced the amount of time it takes to make an accurate and
complete record available in the local catalog. Acquisitions functions such
as ordering, billing and claiming are handled electronically. Tools such as
Cataloger's Desktop that allow rapid searching of a wide variety of resources
help catalogers create records quickly and efficiently. Automated assistance
has even been made available for the creation of authority records, a task
which in its current realization would have been beyond the capabilities of
systems in use a decade ago. All this automation has not only made possible
the generation of records of more consistently high quality and at lower cost,
but also increased the sharing of these records. The savings in staff time
coupled with increases in quality together demonstrate the advantages to be
gained from the use of sophisticated automated routines in library processing
operations.
At most institutions, the assignment of classification and call numbers[2]
is a process not markedly different from that followed thirty years ago. Because
automated assistance has not yet been brought to these activities, today's
typical cataloger must follow a sequence of operations changed from work performed
thirty years ago only to the extent that the local catalog and call number
list are available at the desktop, and not in another room or on another floor.
The local library system at present does nothing to speed or ease the work
involved other than to make records that were once held on cards available
in electronic form.
Because subject analysis remains primarily a manual operation, largely untouched
by advances in automation, the cost of producing call numbers has held steady
while the cost of other parts of cataloging has declined; working with call
numbers forms an increasingly large part of the cost of processing an item.
The most important reason for this is that the processes for generating call
numbers are complex and have traditionally required multiple tools that were
available in paper format only. To complicate matters, many libraries establish
local policies for call numbers, making it difficult for vendors to develop
general tools that could be used by many libraries.
This picture is about to change. The recent availability of some classification
schemes in some kind of machine-readable form, the completion of at least the
bulk of retrospective conversion at many institutions (providing a suitable
basis for local decisions) and continued advances in the sophistication of
automation included in local library systems all point to the possibility that
the era of the manually-assigned call number may be near its end. The time
seems suitable for libraries to present to library automation vendors a scheme
to provide automated assistance for work with classification and call numbers.
At its meeting during the 1999 ALA Annual Conference in New Orleans, the
Standing Committee on Automation (SCA) of the Program for Cooperative Cataloging
(PCC) approved the formation of a task group to study the possibility of automated
approaches to the assignment of classification and call numbers. A charge for
the group was prepared by Karen Calhoun, Chair of SCA, and approved by the
PCC Board at its November, 1999 meeting. The basic idea motivating this group
is that a coordinated plan for providing automated assistance with classification
and call numbers would benefit both library system vendors and librarians.
The task group met as a body once, during the 2000 ALA Midwinter Meeting
in San Antonio. At that meeting, a general plan for action was developed, and
assignments made to members of the group. The remainder of the group's work
took place via e-mail exchanges. The present document represents the task group's
summary of its activities and conclusions. The task group hopes that this document
will help libraries and vendors both understand what may be involved in the
provision of automated assistance in call number assignment, and provide a
common framework under which projects can be undertaken and evaluated.
As far as is known there now exists in production no comprehensive tool that
draws on information held in a local library system to assist in the assignment
of classification and call numbers. However, a number of projects undertaken
at various institutions over the past decade have attacked one or more components
of the process. The following paragraphs describe several of what we feel to
be significant or notable instances of the automation of one part or another
of the process of assigning call numbers. This list is not intended to be comprehensive,
but simply an indication of the kind of work that has been done to advantage
at various institutions.
- Web Dewey. A browser-based version of the Dewey decimal
classification scheme (schedules, tables, manual, index, built numbers from
the index), enhanced with LCSH headings mapped (either editorially or statistically)
to DDC numbers, and the LCSH authority records for the linked headings. Currently
accessed through the CORC project home page, WebDewey can propose Dewey classification
numbers for Web pages as they are harvested for cataloging in CORC; it does
this by scanning the text of the Web page for appropriate grammatical constructions,
and linking the keywords they contain to the Dewey classification index.
WebDewey is also useful in non-Dewey libraries: the LCSH headings associated
with the proposed Dewey numbers can be a useful starting point for libraries
that use other classification schemes. WebDewey also performs the number-building
functions available in the CD-ROM Dewey for Windows product (which
WebDewey is expected to replace in the next several years). Web Dewey (like Dewey
for Windows) does not interact with the local library system. It builds
on technology developed at OCLC for Project Scorpion.
- DESIRE Project [URL http://www.desire.org not working as of July
5, 2005]. [3] The
Desire Project examined the construction of tools to build Internet search
services. Among these is a tool for automatically assigning classification
numbers to Web documents in engineering.
- Web version of Library of Congress classification. This
tool is not yet in production. It was only very recently made available for
review outside the Library of Congress uses a hierarchical browse interface
to identify Library of Congress classification numbers. As the user drills
down through the classification hierarchy, the interface builds the corresponding
classification number. The application at present is a stand-alone application,
that does not interact with LC's online system (or any local system). This
interface is seen as being of special advantage to those not familiar with
a given area of the LC schedule, and for copy catalogers verifying a number
found in an existing record. Naturally, this tool draws on only those parts
of the LC classification that have been converted to machine-readable form.[4]
- Assisted assignment of Cutter numbers. Several tools have
been developed at various institutions to assist catalogers in the assignment
of Cutter numbers and the completion of call numbers. Most of these are stand-alone
applications that simply look up of the Cutter number in a table, or (for
the LC classification) use an algorithm to derive an approximate Cutter number.
The tool developed at Oregon State University takes this notion one step
further. Using this tool, the cataloger stores the classification number
in the Windows clipboard and types the first few letters of the main entry
into a dialog box. The application calculates the Cutter number and adjusts
it to conform to local practice. When the operator presses a function key,
the program performs a call number search in the local catalog (combining
the stored classification number and the derived Cutter number). The operator
completes work on the call number without further assistance from the application.
This tool reduces the time and effort needed to shelflist an item, once the
operator has formulated the classification number.
- Northwestern University's Cataloger's toolkit. The cataloger's
toolkit used at Northwestern with the NOTIS system until 1998 provided, with
little operator intervention, a completely-shelflisted call number. (Northwestern's
main library classifies most materials according to the Dewey scheme.) The
program examined the completed bibliographic record, took into account local
exceptions to standard practice, consulted bibliographic and authority records
in the local database at appropriate points, and asked the operator for guidance
when the appropriate course of action was not certain. With a few clicks
of the mouse, the operator was able to create a complete call number, fitted
comfortably into the web of existing call numbers and conforming closely
to local practices. (A version that works with the Voyager system is in development.)
This tool did not have access to a machine- readable version of any classification
schedule.
These and other pioneering efforts clearly demonstrate that substantial benefits
lower cost, higher productivity and closer adherence to standards can be achieved
if automated assistance for any part of the work needed to assign call numbers
is available; and that benefits should continue to accrue as more parts of
the process are automated and joined together to form a comprehensive set of
tools. By making greater use of technologies they have developed and whose
value they have already demonstrated, libraries are better equipped to handle
the increasing workloads brought by the addition of items in electronic formats
to the list of materials waiting to be processed.
The automation of call number assignment has as yet been untouched by major
system vendors. Those preparing auxiliary products (such as machine-readable
classification schemes) have not yet developed packages that can function in
the broader world. Such tools that exist do not typically interact with the
local system, or other tools. From its examination of the process by which
call numbers are assigned in their own institutions and the foregoing list
of successful automation projects, the task group has identified a number of
areas now ready for automation. Much of this work needs to be done by, or in
collaboration with, vendors of library systems, as the tasks to be performed
must be done, at least in part, in the context of the local database.
The Task Group presents this set of operations ripe for automation as a set
of independent descriptions. A local library system could implement any one
of the functions described below, or any given number of them, as independent
tools, which the cataloger could draw on as needed. Beginning at this scale
would allow institutions to reap immediate benefits from what would in many
cases be a modest amount of development effort on the part of library system
vendors. Applying this approach, system vendors could gain experience in the
largely untested area of call number assignment, and modify these individual
functions quickly as they receive feedback from system users. Such work would
in effect allow libraries to replace many of the manual steps required for
the assignment of call numbers with automated steps. New features could be
added as systems developers gained confidence in this new area; the eventual
outcome might be a tool that smoothly assists the cataloger in most aspects
of call-number assignment in a unified operation; this new tool could allow
libraries further to adjust workflow to take best advantage of it.
- Correctly arrange call numbers in the call number index. Portions
of the classification numbers used by some schemes should be treated as values,
not as simple strings of characters. The portions of classification numbers
to which this consideration applies vary from scheme to scheme.
- Check a call number for duplication. The system searches
the call number supplied by the operator against call numbers of the same
type in the local database. If the number has been used in a different record,
the system notifies the operator of the duplication and provides an easy
means for the operator to review the associated holdings and bibliographic
records. If the number has not been used in another record, the system simply
reports the fact. (Optionally, the system could say nothing at all if the
call number is unique.) Such a feature would not in any way prevent the operator
from adding a record with a duplicate call number (call numbers might be
re-used for any of several reasons), but would merely provide notification
that the number had, or had not, already been used.
This feature could also be made part of a system's bibliographic batch- loading
routine. The system could test the call number in an incoming record for
duplication; the system could either load all records and report duplicates,
or send records with call numbers already present to a holding file for review
and manual update. The system might also provide an arbitrary mechanism (such
as adding an "x" to the end of the number) to distinguish otherwise-identical
call numbers.
- Complete a call number for a member of a classed-together series. The
local copy of the authority record for a series should identify (either explicitly,
or by default[5]) the series classification
practice followed at the local institution. If the series is classed together,
the authority record also contains the basic local classification number
(again, identified either explicitly or by default). A local system acting
either on its own initiative or upon operator request should be able to determine
from the series authority record the local series classification practice;
if the series is classed together according to local practice, the system
should be able to add the series numbering (subfield $v) from the corresponding
bibliographic series heading to the basic call number from the authority
record to form the complete call number. The system should also check the
completed call number for duplication, and present it to the operator for
approval.
- Use author numbers in authority records. In the absence
of a locally- established classification number for a literary author, use
the classification number in the author's authority record (if any) as the
basis for the call number for a belletristic work.
- Complete a call number by assigning a Cutter number. The
operator provides a partially-completed call number and indicates in some
manner the information in the bibliographic record on which the Cutter number
should be based. The system determines the "ideal" Cutter number and investigates
the relevant portion of the local call number index. The system does not
simply determine whether or not the "ideal" number is already present, but
examines items with similar numbers to determine the basis on which they
have been assigned similar cutters, and adjusts the "ideal" number as necessary
to fit comfortably with existing numbers.
This feature would be of use in many classification schemes. For classification
schemes (such as Library of Congress) that use multiple Cutter numbers, the
feature could be designed to handle both the terminal Cutter (which is often
based on the main entry or title) as well as internal Cutters (which are
often subject-based).
- Make use of machine-readable records for classification data. The
structure of records constructed according to the MARC format for classification
data parallels in broad outline that of records constructed according to
the MARC authority format: there are established classification numbers,
reference tracings from unused variant numbers, and reference tracings for
related numbers; there are also notes and links to associated subject headings.
Records for classification data could be loaded into a local file in a manner
that closely parallels the manner in which library systems already handle
authority records, and index entries could be generated from them. Such index
entries, together with the formatted displays generated by the system from
the associated classification records, could replace the print and CD-based
versions of some classification schemes now available. To make the classification
information even more useful, the index entries generated from classification
records could be merged with index entries generated from call numbers in
holdings records in the local file. The resulting hybrid index would work
in a manner parallel to that provided by indexes that merge information from
authority and bibliographic records, and provide products parallel to those
generated from the headings index that mingles bibliographic and authority
information. (Such products include lists of new classification numbers,
unestablished classification numbers, and classification numbers that match
a reference to some other number.)
Records in the MARC classification format can contain elaborate instructions
for building numbers; these instructions are in many instances designed so
that they may be interpreted and acted on by computer programs. The potential
exists for library systems that incorporate machine-readable classification
data to offer assistance in the building of the complete classification number,
and to validate numbers found in existing records and those newly added to
the local database.
As far as is known, there has been no move over the past decade on the part
of any library system vendor to investigate the possibility of using machine-readable
classification data, not even to the simple extent of making the records
available as a stand-alone file for consultation.
- Make classification schemes available in machine-readable form. The
MARC format for classification data was published in 1991. Machine- readable
records for the Library of Congress classification scheme began to appear
in 1997. After input of all LC classes is complete, additions and changes
will also be in machine-readable form. The developers of other classification
schemes have not followed the lead of the Library of Congress, and are not
yet working on MARC-format versions of their schemes. (For example, although
the Dewey scheme is maintained in a proprietary machine-readable format that
is theoretically convertible into the MARC classification format, there has
been no movement to make the Dewey scheme available in this manner.) Suppliers
of other classification schemes commonly used in American libraries, such
as the National Library of Medicine scheme, are even less ready to deliver
records in machine-readable form. Although the availability of the Library
of Congress classification in MARC format will satisfy the needs of many
libraries, some libraries will not be able to make use of some new system
features until machine-readable records for additional classification schemes
are prepared. Providers of the principal classification schemes used in libraries
are urgently requested to make their schemes available in the MARC format.
- Provide open access to machine-readable classification schemes. Systems
and applications that provide access to classification schemes in machine-readable
form should not be closed to outside queries, but should instead provide
an interface that permits other applications to query the classification
information and retrieve results in a manner amenable to further manipulation.
The Library of Congress and Dewey schedules are both distributed in "electronic" form
on CD-ROM. Both products store their data in proprietary formats, and neither
is open to access from applications other than that provided by the included
search/retrieval software. (For example, an external program cannot query
the data stored on the LC CD-ROM to determine whether a classification number
in a bibliographic record is valid.) The Web versions of these products suffer
from similar limitations. Although both of these applications are useful,
they would be even more useful if their data stores could be searched by
other programs and if, were retrieval of information from them were made
possible, that information were presented in a standard format (i.e., in
the MARC format) instead of a proprietary format or as text.
- Provide services that aid in the formulation of classification
numbers. Local systems should provide searches and other tools
that can assist the cataloger in the assignment of classification numbers.
Such services include the following:
- Generate a list of classification numbers associated with a given
subject heading.[6] For headings
built according to the Library of Congress scheme, this list would
at best include only cases in which the given subject heading appeared
as the first subject heading in other bibliographic records.
The list should be arranged in decreasing frequency of occurrence.
- Generate a list of subject headings associated with a given call
number. Again, this list should include records in which the subject
heading is the first subject heading in the record, and shold be
arranged in decreasing frequency of occurrence.
- Notify the operator that the collection already holds other editions
or versions of the work contained in the item being cataloged,
and generate a list of call numbers assigned to those other editions
and versions.
- Be aware of local exceptions to standard classification practice. An
institution will often choose to make local exceptions to standard classification
practice. For example, an institution that normally classes materials with
the Library of Congress scheme may classify materials bound for certain locations,
or with certain characteristics (such as a particular type of material) in
an exceptional manner.[7] Although library
systems written for general use cannot necessarily be expected to be able
to apply the local exceptions to classification practice, the systems
should provide a means for identifying those exceptions, so the
system can notify the operator to divert materials for special handling as
required.
Any of these capabilities added to a library system, or a package containing
all of them as a group of separate functions, would be considered a major
advance in library automation. In the longer term, to reap the greatest advantage
from this development effort and to allow libraries to realize the greatest
cost savings, these discrete system functions should eventually be incorporated
into a comprehensive function that leads the cataloger in a single complex
step to a call number ready to use in the record for an item. This comprehensive
function could be built in large part by assembling into a whole the discrete
functions listed above. If these discrete functions are built from the beginning
with the view that they might eventually become part of such a comprehensive
feature, the work of uniting them into a seamless whole will be reduced.
An outline of the work performed by such a comprehensive feature might contain
the following steps, performed in this order:
- Determine, by examining the record for the item,[8]
whether the item being classified fits any defined local exceptions to
standard classification policy. If so, handle the exception if possible,
or present information regarding the exception to the operator.
- If the record for the item being classified contains any series headings,
check the corresponding authority record for each to find local classification
practice. If the item is a member of a series classed together, formulate
the complete call number and present it to the operator.
- Check the local database for other representations (editions, versions)
of the content carried in the item being classified. If any other representations
are present, notify the operator; as requested by the operator, base
the call number for the current item on one of the existing call numbers.
- If the bibliographic record for the item being classified already
contains a call number of the proper type, check it for validity; if
the number is valid, check it for duplication in the local file. Notify
the operator if the number is a duplicate; adjust the Cutter number as
necessary and appropriate.
- If the record for the item being classified contains a suggested classification
number of the proper type, check it for validity. If the number is valid,
complete it by assigning a Cutter block and checking the resulting number
in the local index.
- If none of the above conditions applies, generate a list of likely
classification numbers by drawing on information in the record for the
item being classified, and present the results to the operator. If the
operator selects one of these numbers, complete it by assigning a Cutter
block and checking the resulting number in the local index.
Closing remarks
The quickly-changing world in which we now all operate places an increasing
strain on the providers of information services. We must all continually examine
the tasks we perform, to make sure they continue to be necessary; and we must
continually seek ways to perform those tasks in the most effective manner.
The generation of classification and call numbers is a complicated task that
continues to be important in the networked environment. The next few years
should see the creation of automated tools that will help those assigning classification
and call numbers to bibliographic records and other types of metadata. The
development of such tools will require cooperation among librarians, other
information brokers and library system vendors, and will constitute a significant
advance in library automation. This work will bring automation to at least
a part of the task of subject analysis, the last large portion of library technical
services to receive such treatment.
Footnotes
- BUBL LINK (bubl.ac.uk/link) is one example of electronic
resources arranged by Dewey classification numbers. For an overview of the
use of classification numbers to arrange Internet resources, see The role
of classification schemes in Internet resource description and discovery
(http://www.ukoln.ac.uk/metadata/desire/classification/).
Back to text
- The assignment of call numbers involves two principal steps
(which may take place simultaneously): subject analysis and shelflisting.
Call numbers used in libraries typically consist of two segments: a classification
number and a Cutter number. (The term number is used even though these two
segments often contain both numerals and letters of the alphabet.) The classification
number, constructed by the rules of a classification scheme or drawn from
a list of numbers valid in a scheme, provides an abstract representation
of the location of the subject matter contained in an item within the organization
of knowledge used in the classification scheme. The Cutter number, constructed
by the cataloger, distinguishes items with the same classification number
and allows for the appropriate subarrangement of materials.
Subject analysis is the determination of the nature and scope of the item
being described, and involves the location of that subject matter in a knowledge
organization system. Classification schemes with their corresponding notation
are one class of such systems; subject analysis in the context of a classification
scheme results in the assignment of a classification number. (Subject heading
systems such as LCSH and MeSH are another examples of a knowledge organization
system; their use results in the assignment of one or more subject headings.)
The construction of the classification number is performed through consultation
of a published classification scheme, and involves also queries of the local
catalog to find other manifestations of the same work, other materials on
similar subjects, and other materials bearing candidate classification numbers;
this work often also involves consulting authority records, for suggested
classification numbers.
Shelflisting involves the addition of arbitrary symbols of various kinds
to the classification number to make a complete call number; this number
fits into a set pattern of subarrangement under a given classification number.
This task is accomplished in part by consulting the local system's online
index of active call numbers. (This task takes its name from the shelflist-a
card file arranged by call number-that has been replaced at least approximately
in online systems by an index of call numbers.)
Back to text
- http://www.desire.org/ [URL http://www.desire.org
not working as of July 5, 2005]
Back to text
- These portions of the Library of Congress classification
scheme are also available in the CD-ROM Classification Plus product.
Back to text
- The system could assume that a library follows LC/ NACO
practice unless local practice is explicitly indicated with the library's
symbol in subfield $5. Here and elsewhere, any call number extracted from
an authority record must be of the 'type' (Dewey, LC, etc.) supported by
the local institution.
Back to text
- The model described here works well for materials represented
by standard bibliographic records-including those prepared according to some
metadata schemes. If intended to assign classification numbers directly to
Web pages (which may or may not contain metadata, or even neatly-coded headers
with subject-loaded terms), such a service would need to examine the content
of the Web page itself.
Back to text
- For example, microforms might be assigned sequential numbers;
sound recordings recordings might be arranged using a locally-developed scheme
or the publisher number; videorecordings of feature films may be arranged
alphabetically under a general classification number; bibliographies cataloged
for the Reference collection may all be placed in LC's 'Z' class.
- The term record here refers both to a separate bibliographic
record in MARC format or some other format, and to metadata stored as part
of an item.
Appendix 1: Charge to the task group
|