05/15/08 [updated 07/09/08]
The major authority record exchange partners (British Library, Library and Archives Canada, Library of Congress, National Library of Medicine, and OCLC) have developed a plan to allow the addition of non-Latin script data (also known as nonroman script data) to name authority records distributed as part of the NACO program. Some basic facts:
- The addition of non-Latin script data is scheduled to begin on July 13, 2008—see Scheduling below.
- Non-Latin script references in authority records will be found in 4XX fields, not parallel 880 fields, and at least initially, the scripts that can be used will be limited—see Basic information below.
- The authority file will be pre-populated with non-Latin references based on bibliographic headings that match entities in the authority file—see Pre-population of the authority file below.
- Libraries should use caution in how they edit or delete the pre-populated data—see Policies for reference construction below.
- Libraries with bibliographic records containing only non-Latin script headings should use caution when comparing those headings against the NACO file, as unwanted flips to romanized forms could result—see Miscellaneous section below.
The following FAQ addresses additional questions about the project:
Q1. Will 880 fields be used in authority records for non-Latin scripts?
A1. No. Although the general practice for bibliographic records in the Anglo-American context has been to follow MARC 21’s “Model A” (roman script data in regular MARC fields, non-Latin script data in 880 fields that parallel regular MARC fields), a different approach has been selected for use in authority records. Authorities will use MARC 21’s “Model B” for multi-script records, where non-Latin script data is entered into the same MARC tags as roman/romanized data, and the non-Latin fields are not linked to parallel roman fields. As a result, field 880 (Alternate Graphic Representation) will not be used (thus $6 (Linkage) will not be used), nor will field 066 (Character sets present).
Q2. What fields in the authority record can have non-Latin scripts?
A2. Non-Latin scripts will be added only in references on name authority records (i.e., 4XX fields), and in selected note fields—667 (Nonpublic general note), 670 (Source data found), and 675 (Source data not found). Some fields, of course, will contain a mix of scripts (e.g., 670 citations with headings represented in both non-Latin script and romanized form). Additional fields may be used in the future, based on user feedback. Note that authorized headings (i.e., 1XX and 5XX) in authority records must be in romanized form.
Q3. Can I add non-Latin script references to records for all types of headings?
A3. Yes. Although the pre-population will only supply non-Latin references for personal names and corporate bodies tagged 110, catalogers can add non-Latin references for conferences, geographic names, uniform titles (including series), name/title headings, etc.
Q4. What scripts might I see in authority records?
A4. For the initial implementation period, the use of non-Latin scripts will be limited to those scripts that represent the MARC-8 repertoire of UTF-8 (Arabic, Chinese, Cyrillic, Greek, Hebrew, Japanese, and Korean scripts). Some authority records will contain multiple non-Latin script references in either the same script (for variants in a given language), or in different scripts. In fact, the authority record for someone like William Shakespeare will likely have references from all of the scripts available for the initial implementation. It is expected that future phases will expand to non-Latin scripts outside the MARC-8 repertoire.
Q5. If I create a new authority record with a romanized heading, am I required to add non-Latin script data to the authority record?
A5. No. Adding non-Latin script data to authority records will be optional for NACO participants.
Q6. We’ve been inputting non-Latin headings in bibliographic records for many years—will it be possible to make use of this data?
A6. Yes, OCLC has developed a capability to pre-populate the NACO authority file with non-Latin references (authority 4XX fields) derived from non-Latin bibliographic heading fields in WorldCat, making use of data-mining techniques developed for the WorldCat Identities project. This approach of harvesting non-Latin heading forms from bibliographic records that correspond to the entities in the authority file will provide an immediate value for the authority file, based on the significant intellectual work of the many institutions that have provided non-Latin script heading on bibliographic records for decades. Note that this pre-population will only occur on name authority records for personal names, and corporate bodies tagged 110. For specific examples from WorldCat Identities, see the entries for Sun Yat-sen or Isaac Bashevis Singer to see the array of possible non-Latin scripts related to these names. While OCLC’s algorithms may do some transformations of bibliographic headings to make them more consistent with the authorized heading in the authority record (e.g., add a death date to a reference when it has already been added to the heading), the pre-population will generally reflect the variety of bibliographic practices—the general philosophy is that it will be easier to delete unneeded references based on policies yet to be developed than to adjust the references.
Q7. How long will it take to pre-populate the authority file with non-Latin script references?
A7. The updating of records is expected to begin no earlier than July 13, 2008, but the number of records updated each day will be limited in order to limit the impact of changes on library systems that load authority records. The number of records updated each day will gradually increase, from about 100 per day to several thousand per day. As a result, it will take many weeks to complete the pre-population—there are potentially 480,000 records to be updated.
Q8. How will I know if a record has been pre-populated with non-Latin script references?
A8. Name authority records that have been through the pre-population process will initially exhibit several characteristics: 008/29 (Reference evaluation) will be set to value ‘b’ (Tracings are not necessarily consistent with the heading); a 667 field (Nonpublic general note) will indicate that non-Latin references have been added programmatically (the note will read: Machine-derived non-Latin script reference project.”); an additional 667 note will be added to indicate that the non-Latin script references have not been evaluated; non-Latin references will be found in field 4XX; 670 (Source found) citations will not be made. After the initial pre-population, catalogers will be able to add 670 citations, and will be able to add, modify, or delete non-Latin references, but all records with non-Latin script references should continue to have 008/29=b.
Q9. Can I edit or delete non-Latin references that have been programmatically added to name authority records via pre-population?
A9. Yes, but please use judgment and err on the side of caution. Since a cataloger probably won’t have the work that caused the pre-population non-Latin references readily at hand, they should exercise caution in removing or editing non-Latin references. Certain changes are encouraged-- for example, egregious errors could be corrected (e.g., obvious typos apparent to someone knowledgeable in the language), and references that were obviously added to the wrong heading can be removed. Because of variant practices adopted by different institutions as to what constitutes the correct form of a non-Latin heading (e.g., whether cataloger-added additions are supplied, and in what script), generally add an additional non-Latin reference to represent a variant formulation preferred by your institution rather than changing an existing reference that represents another formulation of the same name. Please note these types of variations, and express a preference if possible, when providing feedback as requested in the NACO Non-Latin White Paper
Q10. I see different practices used by different language catalogers/institutions for constructing parallel non-Latin headings in bibliographic records—why is that?
A10. Because there have not been uniform practices in the use and form of non-Latin headings in bibliographic records, this lack of uniformity will be reflected in the pre-population of the authority file. We ask catalogers to observe the different practices as a result of the pre-population for the initial observation period (until January 1, 2009)—this observation period will put catalogers in a better position to recommend future best practices for the community at large to follow. Many of the issues that will need to be addressed during this period are addressed in a NACO Non-Latin White Paper—institutions with experience working in non-Latin scripts are encouraged to provide input on the issues raised in the White Paper. Instructions for providing input on the issues raised in the White Paper will be shared in the coming months.
Q11. How should I construct non-Latin references in authority records?
A11. For libraries that choose to apply the option to provide non-Latin script references in authority records, catalogers should generally follow the same practices for authority 4XXs that they have used in the past for parallel headings in bibliographic records until January 1, 2009. Several Library of Congress Rule Interpretations (LCRIs) related to reference construction will be relaxed to reflect the reality of the non-Latin references that will be added during the pre-population and by catalogers prior to January 1, 2009. After consensus is reached on the best practices for non-Latin script references after January 1, 2009, LCRIs will be adjusted to reflect this consensus. Until that time, flexibility will be the key—there is generally no real ‘right’ or ‘wrong’ approach.
Q12. How will I know which non-Latin reference is the valid non-Latin form for a specific language or script?
A12. All non-Latin forms will be added as references, and there may be multiple references for any given language or script—no attempt will be made to signal any one of the references as the valid non-Latin form for the 1XX heading. In a later phase of this project, after issues raised in the NACO Non-Latin White Paper have been resolved, it may be possible to indicate a preferred non-Latin form for a specific language or script, but it is unlikely that field 4XX will be used for this purpose. Stay tuned for future announcements related to this topic.
Q13. Can I ‘evaluate’ non-Latin script references on name authority records?
A13. Not until guidelines for the evaluation of references are developed, after many of the issues found in the NACO Non-Latin White Paper have been resolved. Once reference evaluation guidelines are developed, techniques to indicate which non-Latin script references have been evaluated using field 667 (Nonpublic general note) will be issued, as it is unlikely that records with non-Latin references in several languages and scripts can all be evaluated by the same cataloger. Until reference evaluation guidelines are published, all name authority records with non-Latin script references should contain 008/29 (Reference evaluation) value ‘b’ (Tracings are not necessarily consistent with the heading).
Q14. When can I begin adding non-Latin script data to name authority records?
A14. The answer needs to be divided into two categories— records that already exist prior to Day 1, and records newly created on or after Day 1. Existing authority records: For those records that pre-exist in the NACO file on Day 1 (no earlier than July 13, 2008), care must be taken during the several week long pre-population period to limit activity on those records that are candidates for pre-population. Catalogers are asked not to add non-Latin references until one of the following events occurs:
- it is evident that the record has been updated programmatically as part of the pre-population process (a 667 field indicating this will be found) or,
- if it is not evident that the record has been programmatically pre-populated, please wait for an announcement indicating that the pre-population period has ended. Your patience is appreciated. Newly created records: You may begin to add non-Latin script data to new name authority records created on or after July 13, 2008. Catalogers are asked to set the 008/29 (Reference evaluation) value to ‘b’ (Tracings are not necessarily consistent with the heading) on all newly created records with non-Latin script references until such time as reference evaluation guidelines are developed. Catalogers should also add a 667 "Non-Latin script references not evaluated," or 667 "Non-Latin script reference not evaluated."
Q15. Will records with non-Latin scripts be distributed by the Cataloging Distribution Service?
A15. Name authority records with non-Latin script data will be distributed with all other name authority records, as part of the CDS-Names distribution product. While the OCLC-initiated pre-population of existing authority records is ongoing, the number of records distributed each week is expected to be very large.
Q16. What effects will the non-Latin references have on local systems and on authority control processing?
A16. Effects may vary depending on the system. Announcements of the change and files of sample records with non-Latin references have been made available to the vendor community. You may wish to contact your vendor to verify what the effects may be.
Q17. Are there special normalization rules to be followed for non-Latin references?
A17. The Authority File Comparison Rules (as modified 2007-01-11) were recently redefined with non-Latin scripts in mind. Generally, non-Latin references shouldn’t cause comparison conflicts because references can conflict with references on other records, and non-Latin references will never conflict with authorized headings because authorized headings will always be in roman/romanized script. Note, however, that the new comparison rules forbid a reference that normalizes to the same form as another reference on that same record.
Q18. Will LC be adding non-Latin script data to LCSH subject authorities?
A18. No. LC has no plans at this time to add non-Latin script data to LCSH subject authority records, although Library of Congress Classification records do already contain non-Latin script data in Chinese, Arabic, Hebrew, Cyrillic, and Greek.