DATE: June 1, 1999


NAME: Nonfiling characters in MARC 21 using the control character technique

SOURCE: USMARC electronic list

SUMMARY: This paper discusses the use of the control character technique to block off nonfiling characters in MARC records. It considers whether the technique should be allowed in any field, rather than a restricted list of fields, and whether it can be applied anywhere in a field/subfield, rather than only at the beginning.

KEYWORDS: Nonfiling characters

RELATED: DP 102 (June 1997); 98-16 (June 1998); 98-16R (January 1999)


6/1/99 - Forwarded to the MARC Advisory Committee for discussion at the June 1999 MARBI meetings.

6/27/99 - Results of the MARC Advisory Committee discussion - General agreement that the technique should be available for use with all fields and subfields (except those with fixed position data) to surround any definite or indefinite articles in an initial position in the field or subfield. A majority seemed to favor use of the technique for interpolated corrections, such as "(sic?)" or "i.e." More investigation was needed for articles appended to ends of words and to chemical names. The technique should not be allowed for stopwords and extraneous characters as the terms and characters considered stop and extraneous are usually local preferences.

DISCUSSION PAPER NO. 118: Nonfiling characters


The MARC Advisory Committee has considered the issue of nonfiling characters in the MARC 21 formats on several occasions. There has been consensus that the current technique of using an indicator value to designate the number of nonfiling characters is not adequate, since it is not possible in all fields/subfields that may require setting off nonfiling characters. At its meeting in June 1998, Proposal No. 98-16 was discussed, and the technique of using beginning and ending control characters to block off the nonfiling characters using two characters Hex "X88" and Hex "X89" from ISO 6630 was approved. However, there was not clear consensus on the situations to which the technique could and should be applied.

June 1998 discussions of this issue indicated a desire to limit the application of blocking off nonfiling characters to the beginning of fields or subfields, and to specify which fields/subfields would allow for the technique. However, discussion of Proposal No. 98-16R In January 1999 changed to allow for a broader application of the technique, allowing it to be used in any field/subfield and not only at the beginning of a data element. There was consensus, however, that nonfiling characters could not exceed subfield boundaries.

This paper raises questions that need to be answered in order to establish rules and guidelines for applying the control character technique in cases not covered by Proposal No. 98-16R. The graphics { (for beginning of nonfiling zone) and } (for end of nonfiling zone) have been used in all the following examples to represent the two control characters from ISO 6630.


2.1. Assumptions

The following rules are assumed in the consideration of blocking off a nonfiling zone:

2.2. Fields using the technique

2.2.1. Fields in Proposal No. 98-16R. The marking of nonfiling zones was limited in Proposal No. 98-16R to the following fields/subfields.

Bibliographic format:

Authority format:

Also these same fields when they occur in the Community Information and Classification formats.

2.2.2 Expansion of fields. Discussion at the Midwinter MARC Advisory Committee meeting suggested that the technique be allowed wherever it is needed. The implications of such broad application needs to be considered. Following is an example of an additional situation where the technique might be useful.

The National Library of Medicine wants to suppress data for indexing and sorting in field 440$v (Series statement/Volume number). In this case the control character technique could be used to suppress the volume caption, which currently interferes with sorting/indexing in their system.

440 #0$a International congress series ; $v{no. }1111

If allowed in any field, institutions could use the technique to compensate for limitations and problems in the sorting or indexing capabilities of their individual systems. It is important to consider the impact of doing this in a shared cataloging environment.

Potentially, if the technique can be used in any field, there need to be rules and guidelines so that a consistent use is followed. Should it be allowed in fields that are subject to authority control? How would this affect systems that validate headings?

2.3. Placement within field/subfield

Another issue not yet fully considered is how the control character technique might be used if it were not limited to the beginning of a subfield (as proposed in Proposal No. 16R).

2.3.1. Stopwords. If the control character technique could be used anywhere within a subfield, users may wish to block off words that should be ignored in indexing. Usually systems treat certain frequently used words (such as articles) as "stopwords" so that they are not included in system-created indexes. Having the flexibility to block off words using the technique could be desirable because there are many words that are stopwords in one language and not in another. However, allowing the use of the technique for these may result in inconsistent application from one institution to the next, which in a shared cataloging environment may have serious implications.

700 1# $aHyman, Arthur $d1921- $ecomp $tPhilosophy{ in the} middle ages
[Stopwords are excluded.]

2.3.2. Extraneous characters. The control character technique could be used anywhere within a subfield to ignore characters that may affect sorting. Some systems may automatically ignore them, while others may not. An example might be the musical flat or the dollar sign.

2.3.3. Corrections in data. Incorrect data may be corrected in the body of a subfield in cases prescribed by cataloging rules. Examples are the use of "i.e." and "[sic]" used in dates or titles. These may be ignored for sorting and indexing by using the control character technique.

245 00 $aNewhart. $pYour homebody till somebody love {(sic?)} you.
245 10$a{Th [sic] }first sufi line / $cBill Bissett.
260 $aBoston :$bTicknor and Co., $c1888 [{i.e.}1889]

2.3.4. Articles appended to other words in certain languages. Some languages may append an article to a noun that must be disregarded in sorting and indexing to the end of a word, and it is desirable to disregard these using the control character technique.

245 10 $aMaresal{ul} Ion Antonescu
(Romanian title: The Marshall Ion Antonescu)

245 10 $aIdei{te} za istoriiata :$bXX vek
(Bulgarian title: The ideas of the history : 20th century)

2.4. Questions for further consideration

1. Should the control character technique be available in any fields in the format?

2. Should the control character technique be available anywhere in the field, not just in the initial character(s)?

3. What are the implications if allowed in fields with authority control? In subject headings?

4. Are specific guidelines necessary so that the technique will be consistently applied, or is it acceptable for institutions to use it as needed? If so how should they be developed?

5. Some previous discussion has suggested limiting the use of the technique to sorting rather than indexing. Is this desirable and/or possible?

6. What implications are there for record exchange if systems use the technique to overcome limitations in their systems?

7. What sort of impact studies should be done to further investigate these issues (recommended in the previous discussion)?

