NAME: Alternate graphics without 880 in Bibliographic, Holdings, Authority, and Community Information records
SOURCE: Library of Congress
SUMMARY: This paper discusses the problems with the field 880 technique for alternate graphic data in MARC records and proposes a simplified method for consideration.
KEYWORDS: Field 880 (BD/AD/CI/HD); Unicode; Alternate graphics
RELATED: DP100 (June 1997); DP109 (June 1998)
5/6/98 - Forwarded to the USMARC Advisory Group for discussion at the June 1998 MARBI meetings.
7/29/98 - Results of USMARC Advisory Group discussion - There were only a few minutes left in the meeting to discuss this proposal. RLG and OCLC indicated that changing the 880 as indicated in the DP would result in large design efforts for each. Both utilities use the 880 construct for import and export of records, although RLG clarified that internally they do not retain the construct. they duplicate fields and use script flags to differentiate. OCLC indicated it does use the construct internally. Vendors that handle non-roman data indicated that they sometimes use the 880 internally depending on script needs of the countries in which they are work.
It was noted that there are differences in interpreting whether the 880 fields are intended only for non-roman data. The DP implies that they can hold either, depending on the script of the regular fields but a strict reading of the format indicates they are for non-roman, and some systems have interpreted them that way. Due to time constraints, discussion should continue at the next meeting.
DISCUSSION PAPER NO. 111: Alternate graphics without field 880
Some agencies need to record data in a record in more than one script because (1) the item cataloged is multiscript, (2) different orthographies are used for a language (e.g., kana and kanji scripts for Japanese; devanagari, khmer, and lao scripts for Pali), and (3) transliteration of some or all data in a record is needed to support limitations on automated systems or telecommunications systems and/or needs of users.
In the late 1970s, when non-roman data was not well supported in computing environments in North America, The Research Libraries Group implemented non-roman capability for input, searching, and viewing of bibliographic data. At the same time the MARC Advisory Group approved a technique for carrying alternate graphic data in multiscript records that allowed North American users to easily identify and isolate fields with non-roman data by using a special tag (880). This technique was useful because few local systems could make use of the non-roman information. While the technique itself is not "roman-centric", its implementation has been so in North America. The technique has been very useful in the early period of implementation of multiscript systems. In the 1990s, more systems are multiscript, the format is used in many places where the vernacular script is not roman, and, with the advent of unicode, more inclusion of multiscript data in records can be expected. It is not clear that the 880 field technique should be continued in this new era. This paper discusses an alternative.
As a simplification, the discussion below is presented in terms of a record in an environment that is based on the roman script. In that setting the regular tagged fields are roman and the 880 fields contain data in non- roman script data. In reality, other scripts could be treated as the base script and the roman and other non-base scripts treated in the 880 fields.
In examples in this document, script is designated in field 066 and subfield $6 by name rather than the proper codes that are specified in MARC, e.g., in a MARC record hebrew script should be indicated by the code "(2".
1.2 Current Technique
When non-roman representations of data are co-resident in a record, the non-roman data is currently recorded in 880 fields, then linked to corresponding regular fields containing roman data via the $6 subfield, which also identifies the main script in the 880 field. Subfield $6 is defined to include:
$6<linking tag>-<occurrence number>/<script code>/<field orientation code>For records produced in North America, this means the 880 fields contain most of the non-roman characters in a record.
Bibliographic record: 066 ##$chebrew 245 10$6880-1$a*Al tofa*at ha-ke'ev /$cRefa'el *Karaso. 260 ##$6880-2$a[Tel Aviv] :$bMa*tkal/*Ketsin *hinukh rashi/Gale-Tsahal, Mi*srad ha-bi*ta*hon,$c 880 10$6245-1/hebrew/r$a<Hebrew characters> 880 ##$6260-2/hebrew/r$a[Tel Aviv] :$b<Hebrew characters>$cSubfield $6 is inserted in the fields to link the corresponding roman fields containing transliterations of associated non-roman fields. If a 880 field does not contain a transliteration of data in another field, subfield $6 in field 880 carries the data-specific tag for the information, but the link number is 0.
As more systems become multiscript, there may be less transliteration in descriptive parts of records. LC, for example, provides much more transliterated data in online records for non-roman scripts now than it did when the Government Printing Office was printing vernacular cards for LC's card catalogs. LC expects to increase inclusion of vernacular data (without transliteration) in records when it has vernacular capability in its OPAC. As a result more unlinked data might be carried in the 880 fields instead of the regular tagged fields. Basic fields like the 250 and 260 fields may be missing from the regularly tagged fields.
1.3 Script Code
In subfield $6, following the slash, the code for the first non-roman character set, and thus script, in the field is indicated. If more than one non-roman set is used, the others are not indicated in the $6. The indication of script in subfield $6 is used for unspecified purposes by the application programs. The information is redundant since the same code in a character set escape code sequence, e.g., "esc)2", appears before every series of non-roman characters in the field. It is incomplete since only the first set is specified at the field level. The actual escape code sequences are used by the machine to process the characters in the proper script, not the script code in the $6.
In the future, with the Unicode, escape codes will not be needed for non-roman characters. There may or may not be a need to mark a field at the subfield level as containing other than the base script of the language of the catalog. If there were, there are several schemas for identifying repertoires of characters that could be considered.
Consideration should be given to normalizing the treatment of non-roman data in records by using regular tags rather than 880 fields. With the 880 technique, as systems become multiscript, an increasing amount of data will be forced into 880 fields, data that has no corresponding roman fields. A more logical way to handle the non-roman data will be in the regularly tagged fields, repeating them when there are transliterated versions of fields. It is interesting to note that virtually all systems that support input of multiscript MARC records now, use repeating regular tags rather than 880 fields for displays.
2.1 Subfield $6
If transliterated fields need to be linked to corresponding vernacular data, the subfield $8, which has been established as the "universal" field-to-field linking technique, should be used. Use of subfield $6 for this type of linking information would become increasingly confusing in records with establishment of the subfield $8, an attempt was made to confine $6 to the 880 and associated fields.
The script marker in the $6 subfield is also not needed in records. Specially indicating only one of the scripts that appears in a field is not useful for processing, as is noted above. Thus script coding will probably become obsolete in the future.
The field direction flag might still be needed in some cases. Thus subfield $6 would primarily be used in the future for the directional flag and/or, as proposed in DP 109, the transliteration scheme.
Examples of Bibliographic records: 066 ##$chebrew 245 10$a*Al tofa*at ha-ke'ev /$cRefa'el *Karaso. 245 10$6//r$a<Hebrew characters> 260 ##$6//r$a[Tel Aviv] :$b<Hebrew characters> $c 260 ##$a[Tel Aviv] :$bMa*tkal/*Ketsin *hinukh rashi/Gale-Tsahal, Mi*srad ha-bi*ta*hon,$c 066 ##$cchinese 100 1#$aShen, Wei-pin. 100 1#$a<Chinese characters>. 245 10$aHung Jen-kan /$ccShen Wei-pin chu. 245 10$a<Chinese characters> /$c<Chinese characters>. 250 ##$a<Chinese characters> 260 ##$a<Chinese characters> :$b<Chinese characters>, $c1982 300 ##$a136 p.,  leaf of plates :$bill. ;$c19 cm. 490 1#$82\g$aChung-kuo chin tai shih ts`ung shu 490 1#$82\g$a<Chinese characters> 504 ##$aIncludes bibliographical references. 600 10$81\g$aHung, Jen-Kan,$d1822-1864. 600 10$81\g$a<Chinese characters> 651 #0$aChina$xHistory$yTaiping Rebellion, 1850-1864. 650 #0$aRevolutionists$zChina$xBiography. 830 #0$83\g$aChung-kuo chin tai shih ts`ung (Shanghai, China) 830 #0$83\g$a<Chinese characters> Example of an authority record: 066 ##$alatin$bextended latin$ccyrillic 100 1#$81\g$aZemtsovskii, I. I.$q(Izalii Iosifovich) 400 1#$81\g$a<name in Cyrillic with initials> $q(<qualifier in Cyrillic>) 400 1#$82\g$a<name in Cyrillic> 400 1#$82\g$aZemtsovskii, Izalii Iosifovich 400 1#$aZemtsovskiy, I.2.3 Summary
The following changes would be needed in the Bibliographic, Holdings, Authority, and Community Information formats:
- Impact: This change would have the greatest impact on the networks such as RLG and OCLC that have
implemented non-roman capability, and on vendor systems that can display multiscript data from MARC
records. The design for those systems may vary a great deal, affecting the impact. In the discussions, could
representatives of those organizations discuss impact on their particular system design, especially looking at:
- Is the indication of script in the $6 subfield used by any systems?
- What is the impact of having fields 245, 250, 260, etc. repeatable for alternate graphic data?