NAME: Repeatability of subfield $u (URL) in field 856 of the MARC formats
SOURCE: GOVDOC-L list; OCLC
SUMMARY: This paper explores situations where subfield $u is repeated in field 856, including recording both a PURL and a URL. It proposes that the subfield be made non-repeatable so that if two URLs need to be recorded, the field is repeated.
KEYWORDS: Field 856 (All formats); Subfield $u, in field 856 (All formats); URI; Electronic Location and Access
RELATED: 94-3 (Jan. 1994)
12/11/98 - Forwarded to the MARC Advisory Committee for discussion at the January 1999 MARBI meetings.
1/31/99 - Results of MARC Advisory Committee discussion - Approved.
4/15/99 - Results of LC/NLC review - Approved.
PROPOSAL NO. 99-06: Repeatability of 856 $u
Field 856 was defined in 1993 to provide a link from the bibliographic record to an electronic resource. At the time, the Uniform Resource Locator (URL) was under development and not completely standardized. The field was originally defined to accommodate the three main Internet access methods at the time, email, FTP, and telnet. Elements required for access using the Internet protocols were parsed into separate subfields. As the use of the URL became increasingly widespread, it was added to the 856 field as subfield $u in Proposal No.94-3 (Addition of Subfield $u (Uniform Resource Locator) to Field 856 in the USMARC Holdings/Bibliographic Formats).
Field 856 was originally defined as a provisional field, pending experimentation for final approval. Because it was not certain how it would be used, minimal restrictions were put on the field in terms of required subfields and repetition of subfields. Subfield $u was defined as repeatable.
The MARC Advisory Committee discussed Proposal 95-1 (Changes to Field 856 (Electronic Location and Access) in the USMARC Bibliographic Format), which included various changes to field 856 in January 1995. Part of the discussion included general comments on ambiguities in the field. At that time it was requested that LC clarify the definition of subfield $u by adding to the definition that the subfield is repeated only when the rest of the recorded data in the field applies. Reasons to repeat the field would be: different versions/encodings of the same resource (jpg vs. .gif); different access methods (ftp vs. http); different portions of the resource represented by the URL; related resources.
2.1 Use of repeatable $u
Questions have arisen about when subfield $u should be repeated. Stating that it is repeated only when the rest of the information in the field applies generates questions about when this is applicable, and it does not specify when one is forced to encode a new 856 field. Users have had a hard time interpreting this stipulation. When subfield $u is recorded, it may be the only data in the field. For instance, in the case of mirror sites, where the access method is the same, is subfield $u repeated or the field repeated? It is likely that all the information in the field would apply to both URLs. However, the resource at one of the sites could change, so that repeated fields should be used, although this is not clear from the current guidelines.
A query sent to the USMARC, INTERCAT and GOVDOC-L lists revealed that most users are repeating the field rather than using one 856 field with more than one $u. The one situation that seemed to warrant using a repeatable $u was the practice followed by the Government Printing Office (GPO), which has implemented a PURL server. Catalogers there are updating the OCLC record containing a URL and adding a PURL as another subfield $u. (Note that a Persistent URL, or PURL is intended to allow for persistence by causing a redirect to the PURL server, which stores the actual URL.) Because the data in both $u subfields represent and resolve to the same resource, this seems an appropriate use of repeating subfields. However, the practice of encoding more than one $u in a single 856 field has caused problems for some systems.
2.2 Problems with repeatable $u
MARC users have reported that some systems can not display more than one subfield $u in a field 856. Because GPO (and perhaps other institutions) have used repeated $u subfields, these records have not displayed all the data in the field in some systems. Although this is a problem with particular system implementations of the field, the use of repeatable $u's needs to be reconsidered, since more than one system has reported this problem. If the field is intended to provide a hot link to the resource, how can the system decide which of the multiple links in the same field it should link to?
In messages distributed on the GOVDOC-L list in November 1998, it was suggested that multiple 856 fields be used for the case of recording a PURL and a URL. Since there are many reasons for multiple 856 fields to occur, this could result in more ambiguity if there were many 856 fields in the record. Repeating the field for essentially the same URL would create ambiguity in that the two fields represent the same resource (and essentially the same location). They could not be associated together except if subfield $8 were used, but this would make the record even more complex. It is important to note that OCLC users (and perhaps those of other utilities) have reported difficulties adding the 856 field to records because of the number of fields in the record exceeding the maximum (again, a system problem, but one that is important to consider).
2.3 Recording PURLs and URLs
A PURL is a URL which is intended to provide a persistent location for the resource. Its persistence depends upon the updating of a PURL database when the location of the resource changes. When recorded in an 856 field, it is intended to allow for persistence so that each record containing the URL need not be updated when the location changes. In OCLC's INTERCAT project, records containing field 856 are selected from WorldCat. Each resource described in the record is assigned a PURL and registered in the PURL database. The PURL is recorded in 856$u and the URL originally in the record is transferred to subfield $z (Public note), because OCLC wanted to retain the information originally input in the record. However, since the PURL is supposed to provide persistent access to the resource, it can be argued that there is no reason to retain the URL that might become invalid.
Although there has been some confusion about OCLC policy in terms of retaining the URL in the case of updating a record with a PURL, consultations with OCLC have indicated that the decision is up to the institution, and OCLC does not require that it be retained. Some contributors to the GOVDOC-L discussion on the issue have pointed out that the PURL is only persistent if the institution maintaining it regularly updates the PURL database when URLs change. If it is not updated, the retention of the original URL in the record which has since changed at least gives a hint as to the institution responsible for the resource, while the PURL system may be a host elsewhere. Nonetheless, the retention of the URL should be a question of policy of the inputting agency.
2.4 Making $u non-repeatable
In queries sent to the USMARC and INTERCAT lists, uses of a repeatable subfield $u other than the case of recording both a PURL and a URL have not been specified. Because of the difficulty in providing guidelines for the repetition of subfield $u and the problems of systems displaying more than one, it is proposed that the subfield be made non-repeatable. Thus, situations that require repetition of a URL would result in using a repeated field. In the case of the PURL/URL, if the institution decides to retain the original URL, it could be recorded in subfield $x (Nonpublic note). This would be an appropriate subfield, since it may not be desirable to display the URL to the public because it could cause confusion, but it would allow for its retention in the record. Although OCLC has shifted URLs to subfield $z when assigning PURLs in the INTERCAT project, these records reside only in INTERCAT and have not been distributed that way unless actually changed in WorldCat. Since subfield $z has been used for various purposes (i.e., any kind of note associated with the electronic resource intended for the public), it would be preferable to place URLs in a different subfield.
If this proposal is approved, the Guidelines for the Use of Field 856 would be revised to give further clarification on the recording of multiple URLs.
3 PROPOSED CHANGE
In the USMARC Authority, Bibliographic, Classification, Community Information, and Holdings formats:
- Make subfield $u (Uniform Resource Locator) in field 856 (Electronic Location and Access) non-repeatable.