The Library of Congress >> Especially for Librarians and Archivists >> Standards
HOME >> MARC Development >> Discussion Paper List
DATE: May 26, 2022
NAME: Adding Subfields $0 and $1 to Fields 720 and 653 in the MARC 21 Bibliographic Format
SOURCE: PCC Standing Committee on Standards
SUMMARY: This paper presents a case for adding subfields $0 (Authority record control number and standard number) and $1 (Real World Object URI) to fields 720 (Added Entry-Uncontrolled Name) and 653 (Index Term-Uncontrolled) for uncontrolled data in the MARC 21 Bibliographic Format.
KEYWORDS: Field 720 (BD); Added Entry-Uncontrolled Name (BD); Field 653 (BD); Index Term-Uncontrolled (BD); Subfield $0, in field 720 (BD); Subfield $0, in field 653 (BD); Authority record control number or standard number (BD); Subfield $1, in field 720 (BD); Subfield $1, in field 653 (BD); Real World Object URI (BD)
05/26/22 – Made available to the MARC community for discussion.
06/29/22 – Results of MARC Advisory Committee discussion: The paper was met with general support, but with a desire expressed for clarity around the definition of "uncontrolled" terms. In response, the use case was given of having partial data available in a non-thesaurus source, but not sufficient data to develop a controlled heading. This solution would bridge that gap to point to what source is available. There were further discussions of how the control subfield $0 should be formatted and the utility of subfield $2 for recording the source of other standard numbers. The subfield $2 discussion ultimately resulted in a straw poll that yielded results in favor of its inclusion in fields 720 and 653. The paper will return as a proposal.
The ability to associate an authorized name or subject heading in a MARC record with an identifier or dereferenceable URI was established by the introduction of $0 (Authority record control number or standard number) and $1 (Real World Object URI). In a linked data environment, however, many sources do not record a preferred heading or label. It is the identifier or URI that fixes the identity of the entity being referenced, with the label supplied mainly to aid human readability. Unlike authorized forms in library authority systems, such labels are not necessarily either stable or unique. In traditional library terms these labels are not controlled. In the course of the recent PCC URIs in MARC pilot, participants encountered a number of use cases where it was desired to reference entities from such sources. Wikidata and ISNI are two examples of sources that pilot participants wished to use.
The MARC Bibliographic format makes provision for uncontrolled names and subjects in fields 720 (Added Entry-Uncontrolled Name) and 653 (Index Term-Uncontrolled) respectively. $0 and $1 are not currently defined for these fields. However, terms from nontraditional sources arguably fit well into these fields. Adding an identifier or URI subfield to these fields would enable them to be associated unambiguously with the relevant entities.
As noted above, Wikidata itself does not create authorized access points, just labels that can be changed/updated/swapped around and need not be unique. As we move into real identity management and incorporate more data sources into our descriptions, we will need better ways to use the data present in non-NAF sources. There has long been a desire to reduce the amount of work involved in authority control, especially for entities that exist in sources like Wikidata and ISNI but are not established in the NAF. If $0 and $1 were allowed in the 720, catalogers could use a label from the source vocabulary and the URI as an alternative to creating a NACO record for that entity. This is especially helpful for entities that do not necessarily meet the requirements for an NAR but do warrant an explicit entry in the bibliographic record.
In recent discussions a wide range of use cases that would benefit from this approach have come to light. It can, for example, reduce the workload involved for catalogers working with materials in foreign languages. For electronic serials in particular, there are often Wikidata entries for issuing bodies but not enough information for the cataloger to confidently establish an NAR.
Theses and dissertations present an opportunity to incorporate URIs in fields that are otherwise not controlled. These works often discuss specialized or emerging subjects that are not yet established in LCSH and other "traditional" library vocabularies but do have URIs available. These terms are often added to the 653 field, in addition to other author-supplied keywords. In addition, many institutions have workflows that make it impracticable to create national authority records for authors of theses and dissertations but can take advantage of existing ORCID and other identifiers. Beyond serving an immediate identity management function, adding the $0 or $1 to the 720 would facilitate future authority work if/when the same author publishes in the future.
High-volume archival and special collections resources would similarly benefit from an identity management approach to access. For instance, pursuing authority work for names of buildings and structures is often resource intensive. Frequently, extensive research is conducted only to conclude that the available information about the entity is too incomplete to pursue authority work. The ability to include uncontrolled names of significant structures and prominent landmarks—and include a URI to alternative sources such as the National Register of Historic Places NPGallery Database or SAH (Society of Architectural Historians) Archipedia—increases access while reducing the procedural burden of controlling forms of names.
Enhancing MARC support for external sources also facilitates collaboration with partners outside traditional libraries. In recent discussions with the Buddhist Digital Resource Center (BDRC), Harvard and Columbia catalogers learned that the BDRC, which has previously mapped its names to 720, is capturing associated VIAF IDs and is able to output them in MARC. However, the absence of $1 in 720 is obviously an impediment to doing this. At time of writing, BDRC is investigating the possibility of parsing its names for output to 7XX, but being able to give VIAF URIs in $1 would considerably simplify the transformation process.
It should be noted that MARC already allows identifiers from nontraditional sources such as ISNI and Wikidata to be given in 1XX/6XX/7XX access point fields. The PCC Linked Data Best Practices final report envisages the use of $1 in a traditional access point field in two types of cases: (a) where it is desired to associate a Real World Object (RWO) URI from an external source with a name or subject that is already established in the library's authority file, and (b) where a name is not established in the library's authority file but nevertheless conforms to its conventions for heading construction (e.g., in using a last name, first name form for Western names). Those remain valid uses and the present paper does not suggest making any changes to those fields.
However, many cases are better served by fields outside the traditional 1XX/6XX/7XX access point fields. The point may be illustrated by reference to 6XX subject fields. There are two main reasons why it is problematic to use a standard 610, 611, 630, 647, 648, 650, or 651 field to encode a subject from a nontraditional source such as Wikidata. The first is that it requires the cataloger to know what the correct MARC field is based on the type of entity. For automated processes, it might not be easy to assign the correct tag. In addition, even during manual metadata creation there may be times where the entity being cited does not easily fit into one of the 6XX categories noted above and thus a less rigid 653 would be more appropriate. Since linked data entities are described in their RDF and do not rely on MARC coding to identify their type, the use of the less rigid 653 field fits well with a linked data approach. The second and more fundamental reason why using traditional subject added entry fields can be problematic is that terms drawn from such sources are often neither unique nor stable, which is to say they are not "controlled", as librarians understand the term.
In January 2019, the MARC Advisory Committee (MAC) considered Discussion paper 2019-DP03 from the German National Library on the handling of subjects of an unspecified entity type, i.e., of a type that could not be identified as falling into one of the entity types (personal, corporate, geographical, topical, etc.) designated by the existing 6XX fields. At its meeting MAC concluded that headings of the kind that motivated the discussion paper, from the Bavarian Library Network's Gnomon thesaurus, were indeed "unspecified" but were nevertheless "controlled". Because these subjects were considered to be controlled, MAC concluded that 653 was not appropriate for headings of this type. A new field, 688 (Subject Added Entry-Type of Entity Unspecified) was defined instead. The MAC discussion also raised, but did not pursue, the possibility of a corresponding 7XX field for names of an unspecified entity type that were nevertheless controlled.
The 720 and 653 fields are explicitly designated as uncontrolled. They share the characteristic of 688 of referring to entities whose type cannot always be determined. This can particularly be an issue with data mapped from an external source rather than being inspected on a case-by-case basis by an individual cataloger. (Both fields make limited, but optional, provision to designate entity type via indicator values.) However, they differ from 688 in that the labels are uncontrolled. The terms controlled and uncontrolled are difficult to define rigorously in a MARC context, in part because there is some looseness in the way 1XX/6XX/7XX are defined: they accommodate not only terms that are explicitly established, but also terms that conform to accepted conventions for heading construction, which is a lower bar to clear. However, a term that is uncontrolled will tend to be characterized by the absence of a preferred label that is unique within the specified vocabulary, and the labels that are given are not necessarily stable. The current definitions of 720 and 653 reflect that traditional understanding of "control":
Field 720: Added entry in which the name is not controlled in an authority file or list. It is also used for names that have not been formulated according to cataloging rules.
Field 653: Index term added entry that is not constructed by standard subject heading/thesaurus-building conventions.
These definitions are still applicable to the uses under discussion, since the language pertains to the label and not its identification with an entity.
The possibility of defining identifier subfields for 720 and 653 is consistent with these definitions because, in a linked data context, a label can be uncontrolled in the library authority sense but nevertheless be associated with an entity through the provision of an identifier.
Two further MARC coding options should be noted, one of them in the traditional 6XX block and the other in the recently introduced 758 field.
6XX second indicator 4 is defined for headings with "source not specified", and it may initially appear that this indicator value could be used for uncontrolled subjects. However, the definition makes it clear that these headings differ from other 6XX access points only in that there is no code defined for the source vocabulary in the MARC Subject Heading and Term Source Codes list or implied by a value in the second indicator. In principle such a code can be requested. The definition describes these as controlled headings in contradistinction to uncontrolled terms in 653.
In preliminary comments on this discussion paper, the authors were asked to address the potential use of the 758 (Resource Identifier) field to accommodate subject relationships. While it is true that 758 is defined very broadly and is not intrinsically limited to any particular kind of relationship, that is because it is designed to be flexible enough to accommodate the variety of data models that one may expect to see in linked data statements. The use cases that motivated the 758 proposal were concerned with recording work-to-instance, or what are sometimes called primary relationships. The values that are expected to appear in the $4 predicate position are relationships such as bf:instanceOf. These are different in nature from subject relationships, which are already well catered for in the 6XX fields, or 653 in the case of uncontrolled terms.
Currently neither $0 or $1 are defined for 720 and 653. The addition of $0 does raise a minor technical issue regarding the definition of $0. Appendix A currently states:
Subfield $0 contains the system control number of the related authority or classification record, or a standard identifier. These identifiers may be in the form of text or a Uniform Resource Identifier (URI).
If added to 720 and 653, $0 will never refer to the system control number of a related authority, since if an authorized name were available it would be recorded in a standard 1XX/6XX/7XX field. An authorized subject string would similarly be recorded in a 6XX access point. But $1 accommodates only Real World Object (RWO) URIs, and some identifier schemes use only alphanumeric strings. There may also be applications that prefer to use the alphanumeric form of an identifier even where a corresponding RWO URI is available. For these reasons it may be advantageous to include $0 in 720 and 653 to carry standard alphanumeric identifiers.
Inevitably there are system implications to consider. Labels given in uncontrolled fields will usually be indexed in current discovery systems, but are unlikely to be included in browse indexes. Given that labels given here will not reliably be formulated according to authority conventions, this is arguably more an advantage than a drawback. (Indeed, in one possible implementation scenario, $0 or $1 could be given without an accompanying label in $a; a suitable label could subsequently be pulled in and populated either into the source MARC record or into the discovery layer to serve the needs of a particular audience.) Excluding uncontrolled terms from indexes for 1XX/6XX/7XX access points does have the benefit of removing clutter from headings maintenance routines. Names that can be assumed to follow authority conventions can continue to be coded 1XX/6XX/7XX at the discretion of the cataloging agency. The question remains how to integrate linked data sources such as those envisaged in the use cases discussed here into library discovery. However, that is a much broader issue that will need to be pursued outside the confines of any specific MARC proposal.
In field 720 of the MARC 21 Bibliographic Format, add subfields $0 and $1 (in bold) as follows.
In field 653 of the MARC 21 Bibliographic Format, add subfields $0 and $1 (in bold) as follows.
720 ## $a Tshul khrims rin chen $1 http://viaf.org/viaf/22550486
720 1# $a Liliana Essi $1 http://www.wikidata.org/entity/Q19760388
720 ## $a Kevin Gray $0 (discogs)a312098
720 2# $a The Other Baby $4 prn $0 (imdb)co0776444
653 ## $a Russian invasion of Ukraine $1 http://www.wikidata.org/entity/Q110999040
653 #4 $a Early Jurassic Epoch $1 http://n2t.net/ark:/99152/p09qtgw32q7
653 ## $a Melbourne General Post Office $1 http://www.wikidata.org/entity/Q6811781
653 ## $a Bagras Castle $0 (pleiades)786609869
653 ## $a AcousTech Mastering $0 (discogs)l265260
653 #5 $a Saco Lake $0 (gnis)872606 $1 https://sws.geonames.org/5092045/
Adding $1 to 720 could reduce the data loss in BIBFRAME to MARC conversion. The current mapping from BF contribution agent to MARC 720 does not allow carrying over the URI since there is at present no $1 or $0 in 720.
6.1. Do you agree that it should be possible to associate an uncontrolled name or subject with a linked data entity?
6.2. Does adding $1 to 720 and 653 provide a satisfactory way to make that association? If not, are there alternative approaches that should be considered?
6.3. Should $0 also be defined for non-URI identifiers? If so, will it be necessary to amend the definition of $0 in Appendix A?
6.4. Are there relevant differences between names and subjects that should lead us to define $0 and $1 for one but not the other?
6.5. Are there any potential consequences that this paper does not address?
HOME >> MARC Development >> Discussion Paper List
|The Library of Congress >> Especially
for Librarians and Archivists >> Standards
|Legal | External Link Disclaimer||Contact Us|