Skip Navigation Links The Library of Congress >> Cataloging
Program for Cooperative Cataloging - Library of Congress
  PCC Home >> Automation
Find in

Standing Committee on Automation

Task Group on OCLC Batch Processing

June 2001

TG Members: Karen Calhoun (Cornell), Glenn Patton (OCLC), Robert Wolven (Columbia), Edward Weissman (Chair; Cornell)

1. Executive Summary
2. Recommendations
3. Background
4. The Charge and What We Did
5. What We Learned from the Surveys
6. What We Learned from Analyzing the
Cornell BIBCO Records
7. Credits
8. Conclusions
Appendix

1. EXECUTIVE SUMMARY

The number of PCC records available in WorldCat could be significantly increased, and the value of BIBCO participants' work maximized, by implementing a few, practicable changes in OCLC's batch processing procedures. The body of this report contains the specific recommendations of the SCA Task Group on OCLC Batch Processing.

The current task group revisited issues raised by a 1998 SCA task group on OCLC batchloading. The time was right to reconvene an OCLC batchloading task group, in light of OCLC's invitation of PCC comments on its move to a relational database management system for WorldCat. Besides considering the issues raised by the 1998 group, the task group also considered the findings of three surveys they conducted in April 2001--one survey each of BIBCO liaisons, CONSER liaisons, and OCLC Users Council delegates. Users Council delegates were surveyed due to the concerns they had expressed in one of their meetings about the overlay of OCLC member-contributed records with PCC records.

The survey results suggest that OCLC batchloading libraries that are BIBCO contributors are generally dissatisfied with contributing their Program records via OCLC batch processing. Implementing the task group's recommendations would go far toward remedying their problems. A key finding of the surveys was that BIBCO, CONSER, and OCLC Users Council members generally agree that allowing batchloaded PCC records to replace data in non-PCC records is a desirable change.The survey results further suggest that the most acceptable option for record replacement is to allow a full BIBCO record to replace any non-BIBCO record and a core level BIBCO record to replace a non-BIBCO record with an equal or lower Encoding Level.

In addition to the survey research findings, the task group considered the results of a research project performed by OCLC staff, using a sample of BIBCO new and upgraded records prepared by Cornell in March 2001. Under current OCLC batch processing rules, batchloaded BIBCO upgrades are discarded (after the library's holding symbol is set on the existing record). High barriers also exist for batchloading new BIBCO records. OCLC staff used the Cornell sample to predict what would happen to the sample records if OCLC implemented the task group's batchloading recommendations. The results suggest that, if OCLC implements these batchloading improvements, the number of BIBCO records available for use by the OCLC cataloging community will increase substantially. The changes would also have a positive effect by upgrading vendor records. Further, OCLC's existing procedures for merging certain information as records are replaced with ensure that important classification data and subject access will be maintained.

The complexity of batchloading for CONSER contributors, evolving practice for integrating resources, the transition of many libraries to new integrated library systems, and OCLC's plans for extending WorldCat suggest that the PCC should take a fresh look at the issues of batchloading in two or three years.

2. RECOMMENDATIONS

  1. OCLC should enhance batchloading to:
    1. Accept mixed files containing both BIBCO and non-BIBCO records
    2. Process and load the BIBCO records with 24-48 hours of receipt
    3. Use the following algorithm in batchloading the records:
      1. A full level batchloaded BIBCO record should replace any matching non-BIBCO record in WorldCat.
      2. A core level batchloaded BIBCO record should replace a non-BIBCO record at the core level EL or lower.
  2. BIBCO participants should upload records to OCLC no less than once a week.
  3. CONSER participants should continue to upgrade and maintain records in OCLC for the present. In two to three years, when more experience has been gained in cataloging integrating resources, PCC should consider convening another task group to advise OCLC on batchloading for serial and integrating resource records.
  4. In two-to-three years, convene a new task group on OCLC batchloading. Charge this group to explore the new context for cooperative cataloging and batchloading that will result from
    1. the transition of many PCC libraries to new integrated library systems, and
    2. OCLC's changes to extend WorldCat and develop new services in cataloging and metadata.

3. BACKGROUND

The original OCLC Batchloading Task Group, chaired by Carol Hixson (University of Oregon), was charged in 1998 to advise OCLC on issues surrounding the batchloading of bibliographic and authority records, especially as they related to PCC libraries' contributions. The Group made a number of recommendations, which OCLC considered. Because implementing several key recommendations was tied to the completion of OCLC's Batchload Redesign Project, for which there was no schedule at the time, the PCC Policy Committee discharged the group, with the intent to appoint a new task group once OCLC was prepared to resume work on its batchload design.
At the November 2000 meeting of the PCC Policy Committee, the OCLC member encouraged the Committee to designate a new task group, in light of OCLC's plans for moving to a relational database management system for WorldCat.
Thus a new task group, under the aegis of the PCC Standing Committee on Automation, was constituted and charged to advise OCLC on

  1. the PCC's requirements for batch processing of Program records, and
  2. batchloading in general, in the context of the PCC's broad interest in fostering cooperative cataloging.


4. THE CHARGE AND WHAT WE DID

The Task Group was charged as follows:

  1. Assess the current environment. Conduct a brief survey of PCC libraries to evaluate their current and near-term interest in FTPing BIBCO and CONSER records to OCLC. Also ask for input on enhancements to existing OCLC batchloading services (i.e., is there interest in new functionality?)
  2. Re-evaluate the 1998 Task Group's recommendations. Specifically, what is the relative importance now and in the next two to three years to PCC libraries of
    1. 24-hour turnaround of batch files
    2. batch maintenance of Program records
    3. batch upgrading/merging of existing WorldCat records with PCC records
    4. acceptance of PCC and non-PCC new and changed records in the same FTP file
    5. equal credits for records input online and via FTP
  3. Propose a small research project to OCLC to predict how many and what types of existing member-contributed WorldCat records would be replaced by PCC records, should batch overlay of member records with incoming PCC records be enabled.
  4. Consider changes in the cataloging environment in the next two to three years that would result in additional batch loading requirements to support PCC record transfer, encourage Program growth, and foster cooperative cataloging in general. For example,evaluate the potential impact of PCC contributions and updates of records describing serials and integrating resources.
  5. Work with OCLC Users Council representatives to learn about OCLC members'concerns about batch loading of PCC records. Suggest strategies for addressing these concerns.

The Task Group conducted and analyzed three surveys in April 2001--one each to BIBCO liaisons, CONSER liaisons, and OCLC Users Council delegates. Response rates were satisfactory--BIBCO liaisons returned 27 completed surveys, CONSER 11 surveys, and OCLC Users Council delegates 41 surveys.

A second research project, intended to complement and inform the findings from the surveys, was performed by OCLC. Cornell University Library provided OCLC with a month's sample of BIBCO contributions, 695 records created in March 2001. OCLC evaluated the Cornell records to predict how many and what types of existing OCLC bibliographic records would be replaced by BIBCO records, should batch overlay of member records with incoming BIBCO records be enabled. The results of analysis and the survey results are discussed below.

The OCLC Users Council survey results were discussed with the Users Council at the May meeting. A short presentation was made to the Council as a whole presenting the results. A follow-up discussion was held with the Collections and Technical Services Interest Group. At that session, a preliminary version of the Task Group's recommendations was presented and endorsed by the Interest Group. The Interest Group also suggested that the Task Group's recommendations be publicized widely to the OCLC user community.

5. WHAT WE LEARNED FROM THE SURVEYS

A majority of the reporting BIBCO libraries contribute their new BIBCO records and upgrades online in OCLC (new records, 56%; upgrades 70%). There is little use of CatME among the group as a whole.

Comments suggest that the BIBCO libraries that are working online in OCLC are satisfied with this contribution method. When asked why they have not chosen to batchload their records, they point to limitations of their local systems, better editing in OCLC than in their local cataloging module, a relatively small volume of BIBCO contributions, limitations of current OCLC batchloading functionality, or other reasons.

About 20% of the reporting BIBCO libraries batchload some or all of their BIBCO contributions. These libraries are generally dissatisfied with OCLC batchloading as it currently functions. Problems noted by respondents include having to separate BIBCO and non-BIBCO records into separate FTP files, loss of BIBCO upgrades (because OCLC merely adds the library's holding symbol rather than upgrading the existing OCLC record), problems with consortia or "group" loads, and slow turnaround time for batchloaded files.

As far as enhancements go, the results in Table 1 suggest there is consistent interest among both BIBCO and CONSER libraries in being able to send a "mixed file" via FTP to OCLC--something that OCLC cannot handle at this time. A mixed file contains both PCC program and non-program records, and both new records and upgrades. OCLC would split out the mixed file upon receipt, so that the various kinds of records can be processed appropriately. 69% and 64% of BIBCO and CONSER respondents are somewhat, quite, or intensely interested in this enhancement.

Table 1. Interest in Being Able to Send a Mixed FTP File.

  BIBCO CONSER
  Number Percent Number Percent
No interest at all 2 7.7% 4 36.4%
Not too interested 6 23.1% 0 0.0%
Somewhat interested 7 26.9% 2 18.2%
Quite interested 6 23.1% 3 27.3%
Intensely interested 5 19.2% 2 18.2%
TOTALS 26 100.0% 11 100.0%

BIBCO, CONSER and Users Council respondents seem to agree that allowing batchloaded PCC records to replace data in non-PCC records is a desirable change. Among BIBCO respondents, over 85% were somewhat, quite, or intensely interested in improved batchloading of BIBCO new records and upgrades. Among CONSER respondents, 64% were interested in being able to do maintenance via batchload, and 73% were interested in being able to upgrade non-CONSER records to CONSER status via batchload. Among Users Council respondents, 78% were somewhat in favor or in favor of changes to OCLC batchloading that would facilitate the contribution of PCC records to WorldCat.

BIBCO liaisons and Users Council delegates were asked to react to several options for how PCC records should replace non-PCC records in WorldCat. The results in Table 2 suggest that an acceptable option would be for any BIBCO core record to replace any less-than-full non-BIBCO record, and for BIBCO full records to replace non-BIBCO records.

Table 2. How Replaces Should Occur

  BIBCO Users Council
  Number Percent Number Percent
Any BIBCO core replace less-than-full; BIBCO full replace all non-BIBCO 11 40.7% 21 51.0%
Any BIBCO core or full replace all non-BIBCO 5 18.5% 1 2.5%
Merge BIBCO and non-BIBCO 5 18.5% 13 32.0%
Other 6 22.2% 5 12.0%
Don't change current policy N/A N/A 1 2.5%
TOTALS 27 100.0% 41 100.0%

The concept of merging data from PCC and non-PCC records was less appealing to BIBCO and Users Council respondents, and there was little consensus in their comments about how merging should be done (except for a general desire to retain fields that represent a different classification or subject scheme than is present in the BIBCO record).

There was a good deal of consensus among BIBCO respondents about acceptable turnaround times for batchloading. Because of the small number of respondents, it is more difficult to interpret the CONSER findings; however Table 3 suggests that 24 hours is the most desirable turnaround time among BIBCO liaisons and possibly among CONSER representatives.

Table 3. Desired Turnaround Time for Batchloading

Q8. BIBCO CONSER
Number Percent Number Percent
24 hours 18 66.7% 6 54.5%
36 hours 0 0.0% 0 0.0%
48 hours 5 18.5% 0 0.0%
72 hours 0 0.0% 1 9.1%
Other or no answer 4 14.8% 4 36.4%
TOTALS 27 100.0% 11 100.0%

Survey respondents were asked a series of questions about the potential impact of OCLC batchloading enhancements on their preferences and behavior. Based on the survey results, BIBCO participants are reluctant (47%) or undecided (42%) about switching to batchloading as a contribution method, even if batchloading were improved. There are however a small number of BIBCO contributors for whom batchloading improvements would represent a real step forward for their BIBCO work--New York University, Cornell, Columbia, Harvard, Yale, and Stanford.

While improving batchloading is not likely to increase the number BIBCO participants, it will almost certainly increase the number of BIBCO records available for use in OCLC. Respondents gave two reasons for this:

  1. BIBCO upgrades would be added to the OCLC database, rather than being discarded in the batchloading process, as they are today; and
  2. a large number of items not now chosen for BIBCO treatment because of the presence of vendor or other minimal-level records in OCLC would become candidates for BIBCO cataloging.

5.1 CONSER Results

Results of the CONSER survey were mixed; not much consensus was evident on most questions. With responses from 11 institutions, it's somewhat difficult to draw conclusions.

Among CONSER respondents, interest in batchloading maintenance and upgrades seems stronger than in batchloading new records. While 54% said they are somewhat to very interested in batchloading new CONSER records, 64% said they are somewhat to very interested in doing CONSER maintenance via batchload, and 73% said they are somewhat to very interested in upgrading non-CONSER records via batchload. Again, however, the apparently large percentage shifts really represent a slight shift in degree of interest from 2-3 libraries; another example of the danger of reading much into these results. Those who expressed interest in doing maintenance via batchload also had diverse suggestions for field to be retained when records are replaced. The variety of response is illustrative of the complications inherent in batch maintenance of dynamic records and suggests that more detailed study is needed before any action is taken.

When asked if batchloading improvements would affect their choices or behavior, CONSER library responses are highly diverse. Some 36% of respondents said they would probably or definitely begin to use batchloading as a contribution method if batchloading were improved; however 27% said they would probably or definitely not change their current online practices, and another 27% were not sure how they would react. Similarly, opinions were widely split about whether improvements in OCLC batchloading would increase the number of CONSER records these libraries contribute to the OCLC database.

Nevertheless, the results suggest that although not too many CONSER libraries are interested in batchloading, those that are interested include some potentially large contributors: Cornell, Indiana, Michigan, and Harvard.

5.2 Users Council Results

In past meetings, OCLC Users Council delegates have expressed concern about the overlay of member-contributed records in WorldCat with PCC records. The Users Council survey sought to determine Users Council delegates' concerns and opinions about batchloading of PCC records and to gather recommendations for OCLC.

Some 66% of the Users Council respondents say they are somewhat or very familiar with the PCC's mission and goals, or they are actively participating in one or more PCC programs. Another 22% have at least heard of the PCC, but 12% claim no familiarly with the Program at all. When asked about their familiarity with the PCC full and core records, 49% said they are not familiar at all or have heard of them; another 51% said they are somewhat or very familiar, or they contribute BIBCO or CONSER records to WorldCat.

As with the BIBCO and CONSER respondent groups, few Users Council respondents are currently using batchload as a contribution method (12%).

As mentioned earlier in this report, 78% of the respondents are "somewhat in favor" or "in favor" of OCLC's changing batchloading to facilitate PCC contributions. It would seem, then, that changes being sought by PCC libraries would be favorably received by a majority of Users Council delegates. As reported in Table 2, Users Council respondents seemed to favor the option of any BIBCO core record's replacing any less-than-full record and any BIBCO full record's replacing any non-BIBCO record in WorldCat.

Like the CONSER and BIBCO groups, few Users Council respondents indicated their libraries' choices or behavior would change as a result of improvements in batchloading. Only 3 respondents stated that they would probably become CONSER participants and only 2 respondents stated that they would probably become BIBCO participants if OCLC implements enhancements to make it easier to contribute PCC records via batchload.

6. WHAT WE LEARNED FROM THE ANALYSIS OF THE CORNELL BIBCO RECORDS

Cornell University Library provided OCLC with a month's sample of BIBCO contributions, 695 records created in March 2001. OCLC staff evaluated the Cornell records to predict how many and what types of existing OCLC bibliographic records would be replaced by BIBCO records if OCLC's batchloading procedures were modified according to the various replace options that were included in the surveys. The full results of analysis appear in an appendix to this report.

Of the 695 records in the Cornell file, 589 could be matched to existing WorldCat records (including 271 records added to WorldCat by Cornell via CatME). 105 records could not be matched to existing records and, if loaded, would be new to WorldCat. Coding problems in one record prevented it from being categorized.

If this had been a real batchload using the criteria recommended by the Task Group and considering both the records that would be new to WorldCat and the records that would be replaced, WorldCat would contain 325 more PCC records after the load. Under current batchloading procedures, none of these records would appear in WorldCat as program records.

Analysis of the replaced records revealed two key points:

  • Comparison of the dates associated with Cornell's activity on the records also reinforced the need for prompt turn-around by PCC participants and by OCLC in order to avoid duplicate cataloging efforts.
  • The recommended changes for PCC batchloading would have a positive effect by upgrading vendor records.

In addition, comparison of the notes and access points in the Cornell records and the WorldCat records which would have been replaced revealed that, in 27% of the cases, the Cornell record had fewer notes (not surprising, perhaps, given the core record's emphasis on including only those notes "that support identification of the item"). In only 18% of the cases, the Cornell record had fewer access points. On balance, that seems not to be a significant loss of access given that, in 82% of the cases, the Cornell record had as many or more access points, and all the access points are supported by authority records.

7. CREDITS

The Task Group did not fully explore the issue of OCLC credits for batchloaded records. While this was a topic explored by the initial SCA batchloading task group, since potential PCC batchloading libraries had expressed concern about credits at the time, the current task group found little interest in the topic among respondents to the surveys. Batchloading functionality, rather than credits, appear to be the central concerns of PCC libraries today. Task group members did review the history of credits with OCLC, however. They learned that OCLC online credits for contributed records were instituted in the mid-1980s when OCLC began to charge for searching. The credits for the online input of new records and for various kinds of record upgrade activity were instituted to cover the cost of searching that is needed to determine that no record exists already for the item and that may be needed to find other works using the same headings (in order, for example, to determine a unique form of name for a person or corporate body, etc.). If the institution creates new records or upgrades existing records in the local system, the OCLC view is that the library has not incurred those searching charges and, as part of the batchloading process, the OCLC system incurs all the overhead in identifying duplicates and merging data. As a result, the credit for original batchloaded records is lower.

8. CONCLUSIONS

The number of Program records available in WorldCat could be significantly increased, and the value of BIBCO participants' work maximized, by implementing a few, practicable changes in OCLC's batchloading procedures. These changes include: accepting mixed files of BIBCO and non-BIBCO records; prompt loading of BIBCO records (combined with frequent uploading by BIBCO libraries); and implementation of an algorithm that allows BIBCO records to replace non-BIBCO under specified conditions. OCLC's existing procedures for merging certain information as records are replaced will ensure that important classification information and subject access will be maintained.

The situation regarding CONSER records is more complex, complicated both by evolving practice in cataloging integrating resources and by the difficulty of developing a replacement algorithm for dynamic records. These issues would benefit from further PCC study in two-to-three years.

Changes in the cataloging environment also suggest that PCC should revisit broader issues of batchloading in the context of cooperative cataloging in that same 2-3 year time frame, as more member libraries adjust to the capabilities of new integrated library systems and as OCLC implements planned changes in its cataloging interfaces.


APPENDIX: ANALYSIS OF CORNELL's MARCH 2001 BIBCO RECORDS

There were 695 records in the file representing all of Cornell's PCC activity for the month of March 2001. 589 of those could be matched to existing WorldCat records (including 271 records added to WorldCat by Cornell via CatME). 105 records could not be matched to existing records and, if loaded, would be new to WorldCat. Coding problems in one record prevented it from being categorized.

Of the 318 records that matched existing records (excluding the Cornell originals added via CatME), 50 (16%) of those had the same encoding level as the WorldCat record and 268 (84%) had a different encoding level.

Of the 50 records with the same encoding level as the existing WorldCat record, 17 Cornell records would replace the existing non-PCC record. In the cases that would not replace, the existing records are DLC full or core-level records or full or core-level records from other PCC participants. Also in this group are 9 Cornell records with Encoding Level I or M which should presumably have been changed to either 4 or blank.

Of the 268 records with different encoding levels, 123 (44%) of those were Cornell Enc Lvl '4' records and 138 (53%) were Cornell Encoding level 'blank' records. The remaining 7 Cornell records had another encoding level value (including values 1, 5, 7, 8, and u) which should presumably have been changed to either 4 or blank.

Using the scenario of "any PCC full record would replace any member-input record", all 140 of the Encoding level 'blank' records would replace the existing WorldCat records. Of the 123 Encoding level '4' records, using the scenario of "any PCC core record would replace a lower level record (including level 4 without an 042 field)", 63 (51%) would replace the existing WorldCat records. Of that number, 8 of the WorldCat records are Encoding level 'K', 40 are 'M', 1 is a '4' without an 042, 10 are '5', 2 are '7'and 2 are 8 (non-DLC).

There were 60 (49 %) of the Cornell core-level records that would not replace the existing WorldCat records because the Encoding Level of the WorldCat record is higher. 15 were Encoding level 'blank', 3 were '4' with 042 'pcc', and 42 were 'I'.

If this had been a real batchload and considering both the records that would be new to WorldCat and the records that would be replaced, WorldCat would contain 325 more PCC records after the load.

There were 84 cases among the 268 records with different encoding levels in which the Replaced date in the OCLC record is more recent than the latest date in the 948 field in the Cornell record. (Cornell uses field 948 to track activity on the record and, for purposes of this study, the latest date in field 948 is assumed to be the earliest date on which the record would be available for FTP to OCLC.) This highlights the need for frequent uploads by batchloaders and fast turn-around by OCLC. Records age quickly and prompt turn-around on both sides will help to eliminate duplicative cataloging effort.

Out of all the Cornell records that would replace an existing WorldCat record, 39 of those existing records were vendor records. 20 of those were Harrassowitz records, 8 were Touzot, 5 were Casalini, 4 were Puvill, and 1 each from Iberbook and Centro Di.

The 285 records that are potential replaces were compared with the existing records to determine how notes and access points would be affected by the replace transactions. The following table summarizes the results:

Table 4. Comparison of Cornell batchloaded and Matching WorldCat Records

Cornell records had: Notes Access points
Greater number 41 89
Same number and types 154 128
Same number; different types 13 18
Fewer number 77 50
Top of Page Top of Page
  PCC Home >> Automation
Find in
  The Library of Congress >> Cataloging
  January 3, 2008
Contact Us  
BIBCO CONSER NACO SACO Program for Cooperative Cataloging Home