Library of Congress
Pinyin Conversion Project

Remaining Conversion and Cleanup Tasks

November 5, 2001


REMAINING CONVERSION TASKS

1 Non-Chinese bibliographic records in the LC database

OCLC will send LC a file of all of the non-Chinese DLC records in the WorldCat database that were selected as candidates for conversion. The file will provide images of these records before and after the OCLC conversion. The records will be evaluated and manually converted in the LC database. Steps will be taken to assure that less-full LC records will not overlay fuller records in WorldCat.

After converting those records, the remaining Wade-Giles headings, Chinese subject headings, and former conventional Chinese place names in the LC database must be found and converted to pinyin. For example, we hope to compare the former (Wade-Giles) headings that were converted to pinyin (and marked $wnne) on authority records with headings in the LC database; then, when matches are found, the headings will be converted. Records for instrumental music are not identified by language, so finding romanized Chinese on them will prove to be a challenge.

2 Non-Chinese JACKPHY records on RLIN

The Library's official JACKPHY records reside in RLIN. Strategies for converting Wade-Giles elements in these records will be discussed with RLG.


CLEANUP TASKS

This is the status of pinyin cleanup tasks on November 5, 2001:

Authority records

1 Undifferentiated personal names

We anticipate that the project to review and correct some 8400 undifferentiated (non-unique) authority records with romanized Chinese personal names will be completed before the end of 2001. The volunteer NACO contributors who have worked so diligently to bring order to the national authority file have rendered a great service to librarians and library users.

Authority and bibliographic records:

2 Changes to romanization practices – policy specialists have met several times with catalogers and team leaders to identify Wade-Giles romanization practices that were inconsistent, as well as portions of the new pinyin romanization guidelines that need to be clarified further. When final decisions have been announced, appropriate changes will be made to bibliographic and authority records on a high-priority basis.

Serial records

3 Approximately 1000 Chinese serial records were marked for review. They will be corrected manually.

4 OCLC converted 652 non-Chinese CONSER records, of which 490 are LC records. All of these records will be manually reviewed and corrected, since most headings for personal names were not converted, and the records on which they are found were not marked.

Bibliographic records

5 Error reports resulting from internal LC analyses of imported records [done]

6 Headings which could have 'double-converted' were checked to make sure that they converted correctly [done]; those headings were:

P'i-hsien (Kiangsu Province, China) == Pi Xian (Jiangsu Sheng, China)
T'eng-hsien (Shantung Province, China) == Teng Xian (Shandong Sheng, China)

7 Chinese bib records marked for review:

Originally more than 20,000, or about 14% of all of the converted records, were marked for review. Fewer than half of these records have conversion errors in access points are being manually corrected. We anticipate that corrections to these records will be completed in December 2001 or January 2002. The remaining records are being put aside at this time, and may be corrected on an as-encountered basis.

8 Subject headings that did not convert, or have changed since preparation of conversion specifications have been identified and corrected.

9 Some subject headings and subject subdivisions for regions in China did not convert to pinyin form, while others are transcribed incorrectly on bib records. Therefore, all subject headings for regions in China will be reviewed and converted or corrected where necessary.

10 Bibliographic file maintenance following the cleanup of authority records has been completed.

11 Several multi-syllable generic terms for place names converted incorrectly (as identified in the portion of the home page describing the conversion of bibliographic records). At this time, all of these errors have been corrected, except for the term diqu. That term was converted incorrectly on more than 1000 Chinese records. Corrections are under way, and should be completed by the end of 2001 or early in 2002.

12 After the above corrections have been completed, a final sweep will be conducted to identify Wade-Giles strings that were had not been otherwise identified. We will search for:

- former Chinese conventional place names, Chinese subject headings, and the most-used personal Chinese names;

- particular syllables that may have converted incorrectly: for example, on records with romanized Japanese, no may have converted to nuo, to may have converted to tuo, kai may have converted to gai, etc.; on records with romanized Korean, cho may have converted to zhuo, chang to zhang, etc.

- individual Wade-Giles syllables

13 Single syllable generic terms for jurisdictions: There will be inconsistencies in capitalization of terms such as Sheng (province), Xian (county), and Shi (city) on converted records because the conversion program capitalized these terms when they appeared in certain subfields (most portions of access points), but not in other subfields (descriptive subfields, such as 245$a and 440$a). Other generic terms for jurisdictions were not capitalized by the machine program (such as Zhou (district) and Fu (prefecture). Capitalization errors will be corrected when they are encountered.

14 880 (parallel) fields that converted differently than roman fields will be corrected on an as-encountered basis.


Pinyin Conversion Project Home Page
Cataloging Directorate Home Page
Library of Congress Home Page
Library of Congress
Library of Congress Help Desk (11/07/2001)