Standing Committee on Automation
Task Group on OCLC Batch Processing
June 2001
TG Members: Karen Calhoun (Cornell), Glenn Patton (OCLC), Robert Wolven
(Columbia), Edward Weissman (Chair; Cornell)
1. EXECUTIVE SUMMARY
The number of PCC records available in WorldCat could be significantly increased,
and the value of BIBCO participants' work maximized, by implementing a few,
practicable changes in OCLC's batch processing procedures. The body of this
report contains the specific recommendations of the SCA Task Group on OCLC
Batch Processing.
The current task group revisited issues raised by a 1998 SCA task group on
OCLC batchloading. The time was right to reconvene an OCLC batchloading task
group, in light of OCLC's invitation of PCC comments on its move to a relational
database management system for WorldCat. Besides considering the issues raised
by the 1998 group, the task group also considered the findings of three surveys
they conducted in April 2001--one survey each of BIBCO liaisons, CONSER liaisons,
and OCLC Users Council delegates. Users Council delegates were surveyed due
to the concerns they had expressed in one of their meetings about the overlay
of OCLC member-contributed records with PCC records.
The survey results suggest that OCLC batchloading libraries that are BIBCO
contributors are generally dissatisfied with contributing their Program records
via OCLC batch processing. Implementing the task group's recommendations would
go far toward remedying their problems. A key finding of the surveys was that
BIBCO, CONSER, and OCLC Users Council members generally agree that allowing
batchloaded PCC records to replace data in non-PCC records is a desirable change.The
survey results further suggest that the most acceptable option for record replacement
is to allow a full BIBCO record to replace any non-BIBCO record and a core
level BIBCO record to replace a non-BIBCO record with an equal or lower Encoding
Level.
In addition to the survey research findings, the task group considered the
results of a research project performed by OCLC staff, using a sample of BIBCO
new and upgraded records prepared by Cornell in March 2001. Under current OCLC
batch processing rules, batchloaded BIBCO upgrades are discarded (after the
library's holding symbol is set on the existing record). High barriers also
exist for batchloading new BIBCO records. OCLC staff used the Cornell sample
to predict what would happen to the sample records if OCLC implemented the
task group's batchloading recommendations. The results suggest that, if OCLC
implements these batchloading improvements, the number of BIBCO records available
for use by the OCLC cataloging community will increase substantially. The changes
would also have a positive effect by upgrading vendor records. Further, OCLC's
existing procedures for merging certain information as records are replaced
with ensure that important classification data and subject access will be maintained.
The complexity of batchloading for CONSER contributors, evolving practice
for integrating resources, the transition of many libraries to new integrated
library systems, and OCLC's plans for extending WorldCat suggest that the PCC
should take a fresh look at the issues of batchloading in two or three years.
2. RECOMMENDATIONS
- OCLC should enhance batchloading to:
- Accept mixed files containing both BIBCO and non-BIBCO records
- Process and load the BIBCO records with 24-48 hours of receipt
- Use the following algorithm in batchloading the records:
- A full level batchloaded BIBCO record should replace any matching
non-BIBCO record in WorldCat.
- A core level batchloaded BIBCO record should replace a non-BIBCO
record at the core level EL or lower.
- BIBCO participants should upload records to OCLC no less than once a week.
- CONSER participants should continue to upgrade and maintain records in
OCLC for the present. In two to three years, when more experience has been
gained in cataloging integrating resources, PCC should consider convening
another task group to advise OCLC on batchloading for serial and integrating
resource records.
- In two-to-three years, convene a new task group on OCLC batchloading. Charge
this group to explore the new context for cooperative cataloging and batchloading
that will result from
- the transition of many PCC libraries to new integrated library systems,
and
- OCLC's changes to extend WorldCat and develop new services in cataloging
and metadata.
3. BACKGROUND
The original OCLC Batchloading Task Group, chaired by Carol Hixson (University
of Oregon), was charged in 1998 to advise OCLC on issues surrounding the batchloading
of bibliographic and authority records, especially as they related to PCC libraries'
contributions. The Group made a number of recommendations, which OCLC considered.
Because implementing several key recommendations was tied to the completion
of OCLC's Batchload Redesign Project, for which there was no schedule at the
time, the PCC Policy Committee discharged the group, with the intent to appoint
a new task group once OCLC was prepared to resume work on its batchload design.
At the November 2000 meeting of the PCC Policy Committee, the OCLC member encouraged
the Committee to designate a new task group, in light of OCLC's plans for moving
to a relational database management system for WorldCat.
Thus a new task group, under the aegis of the PCC Standing Committee on Automation,
was constituted and charged to advise OCLC on
- the PCC's requirements for batch processing of Program records, and
- batchloading in general, in the context of the PCC's broad interest in
fostering cooperative cataloging.
4. THE CHARGE AND WHAT WE DID
The Task Group was charged as follows:
- Assess the current environment. Conduct a brief survey of PCC libraries
to evaluate their current and near-term interest in FTPing BIBCO and CONSER
records to OCLC. Also ask for input on enhancements to existing OCLC batchloading
services (i.e., is there interest in new functionality?)
- Re-evaluate the 1998 Task Group's recommendations. Specifically, what is
the relative importance now and in the next two to three years to PCC libraries
of
- 24-hour turnaround of batch files
- batch maintenance of Program records
- batch upgrading/merging of existing WorldCat records with PCC records
- acceptance of PCC and non-PCC new and changed records in the same
FTP file
- equal credits for records input online and via FTP
- Propose a small research project to OCLC to predict how many and what types
of existing member-contributed WorldCat records would be replaced by PCC
records, should batch overlay of member records with incoming PCC records
be enabled.
- Consider changes in the cataloging environment in the next two to three
years that would result in additional batch loading requirements to support
PCC record transfer, encourage Program growth, and foster cooperative cataloging
in general. For example,evaluate the potential impact of PCC contributions
and updates of records describing serials and integrating resources.
- Work with OCLC Users Council representatives to learn about OCLC members'concerns
about batch loading of PCC records. Suggest strategies for addressing these
concerns.
The Task Group conducted and analyzed three surveys in April 2001--one each
to BIBCO liaisons, CONSER liaisons,
and OCLC Users Council delegates. Response
rates were satisfactory--BIBCO liaisons returned 27 completed surveys, CONSER
11 surveys, and OCLC Users Council delegates 41 surveys.
A second research project, intended to complement and inform the findings
from the surveys, was performed by OCLC. Cornell University Library provided
OCLC with a month's sample of BIBCO contributions, 695 records created in March
2001. OCLC evaluated the Cornell records to predict how many and what types
of existing OCLC bibliographic records would be replaced by BIBCO records,
should batch overlay of member records with incoming BIBCO records be enabled.
The results of analysis and the survey results are discussed below.
The OCLC Users Council survey results were discussed with the Users Council
at the May meeting. A short presentation was made to the Council as a whole
presenting the results. A follow-up discussion was held with the Collections
and Technical Services Interest Group. At that session, a preliminary version
of the Task Group's recommendations was presented and endorsed by the Interest
Group. The Interest Group also suggested that the Task Group's recommendations
be publicized widely to the OCLC user community.
5. WHAT WE LEARNED FROM THE SURVEYS
A majority of the reporting BIBCO libraries contribute their new BIBCO records
and upgrades online in OCLC (new records, 56%; upgrades 70%). There is little
use of CatME among the group as a whole.
Comments suggest that the BIBCO libraries that are working online in OCLC
are satisfied with this contribution method. When asked why they have not chosen
to batchload their records, they point to limitations of their local systems,
better editing in OCLC than in their local cataloging module, a relatively
small volume of BIBCO contributions, limitations of current OCLC batchloading
functionality, or other reasons.
About 20% of the reporting BIBCO libraries batchload some or all of their
BIBCO contributions. These libraries are generally dissatisfied with OCLC batchloading
as it currently functions. Problems noted by respondents include having to
separate BIBCO and non-BIBCO records into separate FTP files, loss of BIBCO
upgrades (because OCLC merely adds the library's holding symbol rather than
upgrading the existing OCLC record), problems with consortia or "group" loads,
and slow turnaround time for batchloaded files.
As far as enhancements go, the results in Table 1 suggest there is consistent
interest among both BIBCO and CONSER libraries in being able to send a "mixed
file" via FTP to OCLC--something that OCLC cannot handle at this time. A mixed
file contains both PCC program and non-program records, and both new records
and upgrades. OCLC would split out the mixed file upon receipt, so that the
various kinds of records can be processed appropriately. 69% and 64% of BIBCO
and CONSER respondents are somewhat, quite, or intensely interested in this
enhancement.
Table 1. Interest in Being Able to Send a Mixed FTP File.
| |
BIBCO |
CONSER |
| |
Number |
Percent |
Number |
Percent |
| No interest at all |
2 |
7.7% |
4 |
36.4% |
| Not too interested |
6 |
23.1% |
0 |
0.0% |
| Somewhat interested |
7 |
26.9% |
2 |
18.2% |
| Quite interested |
6 |
23.1% |
3 |
27.3% |
| Intensely interested |
5 |
19.2% |
2 |
18.2% |
| TOTALS |
26 |
100.0% |
11 |
100.0% |
BIBCO, CONSER and Users Council respondents seem to agree that allowing batchloaded
PCC records to replace data in non-PCC records is a desirable change. Among
BIBCO respondents, over 85% were somewhat, quite, or intensely interested in
improved batchloading of BIBCO new records and upgrades. Among CONSER respondents,
64% were interested in being able to do maintenance via batchload, and 73%
were interested in being able to upgrade non-CONSER records to CONSER status
via batchload. Among Users Council respondents, 78% were somewhat in favor
or in favor of changes to OCLC batchloading that would facilitate the contribution
of PCC records to WorldCat.
BIBCO liaisons and Users Council delegates were asked to react to several
options for how PCC records should replace non-PCC records in WorldCat. The
results in Table 2 suggest that an acceptable option would be for any BIBCO
core record to replace any less-than-full non-BIBCO record, and for BIBCO full
records to replace non-BIBCO records.
Table 2. How Replaces Should Occur
| |
BIBCO |
Users Council |
| |
Number |
Percent |
Number |
Percent |
| Any BIBCO core replace less-than-full; BIBCO full replace
all non-BIBCO |
11 |
40.7% |
21 |
51.0% |
| Any BIBCO core or full replace all non-BIBCO |
5 |
18.5% |
1 |
2.5% |
| Merge BIBCO and non-BIBCO |
5 |
18.5% |
13 |
32.0% |
| Other |
6 |
22.2% |
5 |
12.0% |
| Don't change current policy |
N/A |
N/A |
1 |
2.5% |
| TOTALS |
27 |
100.0% |
41 |
100.0% |
The concept of merging data from PCC and non-PCC records was less appealing
to BIBCO and Users Council respondents, and there was little consensus in their
comments about how merging should be done (except for a general desire to retain
fields that represent a different classification or subject scheme than is
present in the BIBCO record).
There was a good deal of consensus among BIBCO respondents about acceptable
turnaround times for batchloading. Because of the small number of respondents,
it is more difficult to interpret the CONSER findings; however Table 3 suggests
that 24 hours is the most desirable turnaround time among BIBCO liaisons and
possibly among CONSER representatives.
Table 3. Desired Turnaround Time for Batchloading
| Q8. |
BIBCO |
CONSER |
|
Number |
Percent |
Number |
Percent |
| 24 hours |
18 |
66.7% |
6 |
54.5% |
| 36 hours |
0 |
0.0% |
0 |
0.0% |
| 48 hours |
5 |
18.5% |
0 |
0.0% |
| 72 hours |
0 |
0.0% |
1 |
9.1% |
| Other or no answer |
4 |
14.8% |
4 |
36.4% |
| TOTALS |
27 |
100.0% |
11 |
100.0% |
Survey respondents were asked a series of questions about the
potential impact of OCLC batchloading enhancements on their preferences and
behavior. Based on the survey results, BIBCO participants are reluctant (47%)
or undecided (42%) about switching to batchloading as a contribution method,
even if batchloading were improved. There are however a small number of BIBCO
contributors for whom batchloading improvements would represent a real step
forward for their BIBCO work--New York University, Cornell, Columbia, Harvard,
Yale, and Stanford.
While improving batchloading is not likely to increase the number
BIBCO participants, it will almost certainly increase the number of BIBCO records
available for use in OCLC. Respondents gave two reasons for this:
- BIBCO upgrades would be added to the OCLC database, rather than being discarded
in the batchloading process, as they are today; and
- a large number of items not now chosen for BIBCO treatment because of the
presence of vendor or other minimal-level records in OCLC would become candidates
for BIBCO cataloging.
5.1 CONSER Results
Results of the CONSER survey were mixed; not much consensus was evident on
most questions. With responses from 11 institutions, it's somewhat difficult
to draw conclusions.
Among CONSER respondents, interest in batchloading maintenance and upgrades
seems stronger than in batchloading new records. While 54% said they are somewhat
to very interested in batchloading new CONSER records, 64% said they are somewhat
to very interested in doing CONSER maintenance via batchload, and 73% said
they are somewhat to very interested in upgrading non-CONSER records via batchload.
Again, however, the apparently large percentage shifts really represent a slight
shift in degree of interest from 2-3 libraries; another example of the danger
of reading much into these results. Those who expressed interest in doing maintenance
via batchload also had diverse suggestions for field to be retained when records
are replaced. The variety of response is illustrative of the complications
inherent in batch maintenance of dynamic records and suggests that more detailed
study is needed before any action is taken.
When asked if batchloading improvements would affect their choices or behavior,
CONSER library responses are highly diverse. Some 36% of respondents said they
would probably or definitely begin to use batchloading as a contribution method
if batchloading were improved; however 27% said they would probably or definitely
not change their current online practices, and another 27% were not sure how
they would react. Similarly, opinions were widely split about whether improvements
in OCLC batchloading would increase the number of CONSER records these libraries
contribute to the OCLC database.
Nevertheless, the results suggest that although not too many CONSER libraries
are interested in batchloading, those that are interested include some potentially
large contributors: Cornell, Indiana, Michigan, and Harvard.
5.2 Users Council Results
In past meetings, OCLC Users Council delegates have expressed concern about
the overlay of member-contributed records in WorldCat with PCC records. The
Users Council survey sought to determine Users Council delegates' concerns
and opinions about batchloading of PCC records and to gather recommendations
for OCLC.
Some 66% of the Users Council respondents say they are somewhat or very familiar
with the PCC's mission and goals, or they are actively participating in one
or more PCC programs. Another 22% have at least heard of the PCC, but 12% claim
no familiarly with the Program at all. When asked about their familiarity with
the PCC full and core records, 49% said they are not familiar at all or have
heard of them; another 51% said they are somewhat or very familiar, or they
contribute BIBCO or CONSER records to WorldCat.
As with the BIBCO and CONSER respondent groups, few Users Council respondents
are currently using batchload as a contribution method (12%).
As mentioned earlier in this report, 78% of the respondents are "somewhat
in favor" or "in favor" of OCLC's changing batchloading to facilitate PCC contributions.
It would seem, then, that changes being sought by PCC libraries would be favorably
received by a majority of Users Council delegates. As reported in Table 2,
Users Council respondents seemed to favor the option of any BIBCO core record's
replacing any less-than-full record and any BIBCO full record's replacing any
non-BIBCO record in WorldCat.
Like the CONSER and BIBCO groups, few Users Council respondents indicated
their libraries' choices or behavior would change as a result of improvements
in batchloading. Only 3 respondents stated that they would probably become
CONSER participants and only 2 respondents stated that they would probably
become BIBCO participants if OCLC implements enhancements to make it easier
to contribute PCC records via batchload.
6. WHAT WE LEARNED FROM THE ANALYSIS OF THE CORNELL
BIBCO RECORDS
Cornell University Library provided OCLC with a month's sample of BIBCO contributions,
695 records created in March 2001. OCLC staff evaluated the Cornell records
to predict how many and what types of existing OCLC bibliographic records would
be replaced by BIBCO records if OCLC's batchloading procedures were modified
according to the various replace options that were included in the surveys.
The full results of analysis appear in an appendix to this report.
Of the 695 records in the Cornell file, 589 could be matched to existing WorldCat
records (including 271 records added to WorldCat by Cornell via CatME). 105
records could not be matched to existing records and, if loaded, would be new
to WorldCat. Coding problems in one record prevented it from being categorized.
If this had been a real batchload using the criteria recommended by the Task
Group and considering both the records that would be new to WorldCat and the
records that would be replaced, WorldCat would contain 325 more PCC records
after the load. Under current batchloading procedures, none of these records
would appear in WorldCat as program records.
Analysis of the replaced records revealed two key points:
- Comparison of the dates associated with Cornell's activity on the records
also reinforced the need for prompt turn-around by PCC participants and by
OCLC in order to avoid duplicate cataloging efforts.
- The recommended changes for PCC batchloading would have a positive effect
by upgrading vendor records.
In addition, comparison of the notes and access points in the Cornell records
and the WorldCat records which would have been replaced revealed that, in 27%
of the cases, the Cornell record had fewer notes (not surprising, perhaps,
given the core record's emphasis on including only those notes "that support
identification of the item"). In only 18% of the cases, the Cornell record
had fewer access points. On balance, that seems not to be a significant loss
of access given that, in 82% of the cases, the Cornell record had as many or
more access points, and all the access points are supported by authority records.
7. CREDITS
The Task Group did not fully explore the issue of OCLC credits for batchloaded
records. While this was a topic explored by the initial SCA batchloading task
group, since potential PCC batchloading libraries had expressed concern about
credits at the time, the current task group found little interest in the topic
among respondents to the surveys. Batchloading functionality, rather than credits,
appear to be the central concerns of PCC libraries today. Task group members
did review the history of credits with OCLC, however. They learned that OCLC
online credits for contributed records were instituted in the mid-1980s when
OCLC began to charge for searching. The credits for the online input of new
records and for various kinds of record upgrade activity were instituted to
cover the cost of searching that is needed to determine that no record exists
already for the item and that may be needed to find other works using the same
headings (in order, for example, to determine a unique form of name for a person
or corporate body, etc.). If the institution creates new records or upgrades
existing records in the local system, the OCLC view is that the library has
not incurred those searching charges and, as part of the batchloading process,
the OCLC system incurs all the overhead in identifying duplicates and merging
data. As a result, the credit for original batchloaded records is lower.
8. CONCLUSIONS
The number of Program records available in WorldCat could be significantly
increased, and the value of BIBCO participants' work maximized, by implementing
a few, practicable changes in OCLC's batchloading procedures. These changes
include: accepting mixed files of BIBCO and non-BIBCO records; prompt loading
of BIBCO records (combined with frequent uploading by BIBCO libraries); and
implementation of an algorithm that allows BIBCO records to replace non-BIBCO
under specified conditions. OCLC's existing procedures for merging certain
information as records are replaced will ensure that important classification
information and subject access will be maintained.
The situation regarding CONSER records is more complex, complicated both by
evolving practice in cataloging integrating resources and by the difficulty
of developing a replacement algorithm for dynamic records. These issues would
benefit from further PCC study in two-to-three years.
Changes in the cataloging environment also suggest that PCC should revisit
broader issues of batchloading in the context of cooperative cataloging in
that same 2-3 year time frame, as more member libraries adjust to the capabilities
of new integrated library systems and as OCLC implements planned changes in
its cataloging interfaces.
APPENDIX: ANALYSIS OF CORNELL's MARCH 2001 BIBCO
RECORDS
There were 695 records in the file representing all of Cornell's PCC activity
for the month of March 2001. 589 of those could be matched to existing WorldCat
records (including 271 records added to WorldCat by Cornell via CatME). 105
records could not be matched to existing records and, if loaded, would be new
to WorldCat. Coding problems in one record prevented it from being categorized.
Of the 318 records that matched existing records (excluding the Cornell originals
added via CatME), 50 (16%) of those had the same encoding level as the WorldCat
record and 268 (84%) had a different encoding level.
Of the 50 records with the same encoding level as the existing WorldCat record,
17 Cornell records would replace the existing non-PCC record. In the cases
that would not replace, the existing records are DLC full or core-level records
or full or core-level records from other PCC participants. Also in this group
are 9 Cornell records with Encoding Level I or M which should presumably have
been changed to either 4 or blank.
Of the 268 records with different encoding levels, 123 (44%) of those were
Cornell Enc Lvl '4' records and 138 (53%) were Cornell Encoding level 'blank'
records. The remaining 7 Cornell records had another encoding level value (including
values 1, 5, 7, 8, and u) which should presumably have been changed to either
4 or blank.
Using the scenario of "any PCC full record would replace any member-input
record", all 140 of the Encoding level 'blank' records would replace the existing
WorldCat records. Of the 123 Encoding level '4' records, using the scenario
of "any PCC core record would replace a lower level record (including level
4 without an 042 field)", 63 (51%) would replace the existing WorldCat records.
Of that number, 8 of the WorldCat records are Encoding level 'K', 40 are 'M',
1 is a '4' without an 042, 10 are '5', 2 are '7'and 2 are 8 (non-DLC).
There were 60 (49 %) of the Cornell core-level records that would not replace
the existing WorldCat records because the Encoding Level of the WorldCat record
is higher. 15 were Encoding level 'blank', 3 were '4' with 042 'pcc', and 42
were 'I'.
If this had been a real batchload and considering both the records that would
be new to WorldCat and the records that would be replaced, WorldCat would contain
325 more PCC records after the load.
There were 84 cases among the 268 records with different encoding levels
in which the Replaced date in the OCLC record is more recent than the latest
date in the 948 field in the Cornell record. (Cornell uses field 948 to track
activity on the record and, for purposes of this study, the latest date in
field 948 is assumed to be the earliest date on which the record would be available
for FTP to OCLC.) This highlights the need for frequent uploads by batchloaders
and fast turn-around by OCLC. Records age quickly and prompt turn-around on
both sides will help to eliminate duplicative cataloging effort.
Out of all the Cornell records that would replace an existing WorldCat record,
39 of those existing records were vendor records. 20 of those were Harrassowitz
records, 8 were Touzot, 5 were Casalini, 4 were Puvill, and 1 each from Iberbook
and Centro Di.
The 285 records that are potential replaces were compared with the existing
records to determine how notes and access points would be affected by the replace
transactions. The following table summarizes the results:
Table 4. Comparison of Cornell batchloaded and Matching WorldCat Records
| Cornell records had: |
Notes |
Access points |
| Greater number |
41 |
89 |
| Same number and types |
154 |
128 |
| Same number; different types |
13 |
18 |
| Fewer number |
77 |
50 |
|