The Library of Congress >> Especially for Librarians and Archivists >> Standards
HOME >> MARC Development >> Discussion Paper List
DATE: December 14, 2009
NAME: Encoding URIs for controlled values in MARC records
SOURCE: RDA/MARC Working Group
SUMMARY: This paper suggests recording URIs for controlled values in the subfield appropriate to the value itself, distinguished by angle brackets around the URI, which is a standard way of representing them.
KEYWORDS: URIs; Controlled values; RDA
RELATED: 2009-DP01/1; 2009-DP06/1
12/14/2009 - Made available to the MARC 21 community for discussion.
1/17/10 - Results of MARC Advisory Committee discussion: Some participants were reluctant to experiment with encoding URIs in MARC records because of the large amount of effort for systems to support experimentation. This includes questions about how to explain, what to get back, how to define the relationship between a value and a URI. Some were interested in experimenting with a set of test records. Nothing will be finalized on this until issues are sorted out, but a document will be prepared with some guidelines and examples of how URIs might be used in MARC records so that those that wish to may experiment.
Elements in the MARC format may use different types of value strings, such as text, codes, or URI for controlled values. The use of a URI instead of plain text is particularly applicable to situations where the value of the particular element comes from a controlled vocabulary, which could be an authority list or formal thesaurus (e.g. a name from the LC Name Authority File or a topic for an LCSH heading) or any other list of controlled codes or terms (e.g. the MARC Code List for Languages). Although URIs have not been made available for values in all of the aforementioned controlled vocabularies, work is underway to provide them. LC’s Network Development and MARC Standards Office is developing a registry service for controlled lists and in so doing is establishing URIs both for the lists themselves and for each value on a list. When completed these will be available at http://id.loc.gov/ and will include the MARC language codes, MARC country codes, MARC relator codes, MARC geographic area codes, ISO 639-2 language codes, and other value lists. Other agencies are also developing URI lists. OCLC's terminologies service is one instance. RDA has many vocabularies and a DCMI Task Group is establishing URIs that identify each value or concept in those RDA vocabularies.
Encoding URI to represent controlled values was first discussed during the Midwinter 2009 MARBI meeting in 2009-DP01/1. The MARC Advisory Committee considered whether to use the applicable URI in the appropriate subfield in place of the value or to define a new subfield for the URI. The preference was to define subfield $1 (one) across fields and formats to enable the encoding of a URI that would replace or supplement the textual or coded value. The topic was presented again at the Annual 2009 meetings in Chicago in another discussion paper, 2009-DP06/1. There was not consensus about the best approach. Some participants were wary of the proposed subfield $1, which would derive its meaning from its order of placement, in relation to other subfields within the field. Others suggested a new field modeled on field 880 (Alternate graphic representation) where URIs are included in appropriate subfields where the value would otherwise be recorded and linked to the field that contains the literal values. Although a mechanism for recording controlled value URIs is not a necessity for implementation of RDA, participants expressed interest in having such a method for experimentation. This paper reconsiders previous discussion, suggesting a simpler mechanism for recording them that does not require change to the format. It focuses on relating vocabulary values with their corresponding URI links.
URIs for controlled vocabularies are identifiers for a concept, which may take the form of a code (e.g., language code "eng"), a term (e.g., RDA content type "sounds"), or a string that identifies a bibliographic resource (e.g., a subject, name, or name/title heading). Whether the URI is resolvable is another issue. URIs may be pure identifiers or resolvable ones.
The URIs for bibliographic resources themselves are not considered here. URIs for resources themselves may be identified in field 856, or, if a resource is a supplemental resource, in subfield $u within other MARC fields.
URIs for controlled values and headings are most likely to be used for the following:
Previous discussions have suggested defining a new subfield $1 (the only subfield available across MARC fields and formats) for controlled value URIs, but there is no easy way to link them to the subfield(s) to which they pertain. Therefore one would have to rely on subfield sequencing to identify which subfield they relate to. Extensive rules would need to be written to specify subfield sequencing. The automated systems libraries use would need to be able to retain and understand the importance of subfield sequencing. The other suggestion, using a field similar to field 880, which links to another field to identify the related data element, is a cumbersome technique that requires special computer processing.
For code and term values this paper suggests using the same subfield for a URI that is already defined for a given data element. URIs are clearly identifiable by their syntax and systems know what to do with them. A convention could be used to indicate that they are URIs rather than literal values. It is suggested that angle brackets be used around the URI, which is a standard way to reference URIs. Instead or in addition, a mark such as the exclamation point could be used before the URI. Programmers at LC have indicated that it would be easier to process these URIs if they are in appropriate subfields rather than the other alternatives. Therefore, so that order does not determine what value a URI is related to, it is suggested that URIs be recorded in the same subfield as the value, separated by angle brackets or by exclamation and angle brackets as noted above. Systems will be able to identify that a value is a URI if it is in the standard syntax and surrounded by angle brackets.
For exchange of records the best practice would be to supply both the URIs and the values in records. Internally systems may store only one or the other or both.
Since fixed fields have a specified number of characters, URIs can not be added to them. In cases where it is necessary to record a URI for data that belongs in a fixed field, an alternate variable field should be used. The code would be in the fixed field and repeated in the variable field with its URI. For example, the format already specifies for field 041 (Language Code) that the code in 008 is repeated as the first code in 041, therefore the URI for that code would be in the subfield with the repetition of the code. All other fixed fields would need to be examined and the appropriate variable fields that relate to them specified with guidelines where needed.
There are currently some uses of angle brackets in MARC data to show, primarily, partial runs of multipart items, especially series and serials. The main areas are in subfield $3 (Materials specified) in the 490 and 8XX series fields and in field 300 (Physical Description) subfield $a (Extent). Also the parts to which a characteristic may apply may be indicated with angle brackets in the Bibliographic 5XX note fields, especially field 500 (General Note). It is also expected that the specifications for the repetition of field 260 (Publication, Distribution, etc.) will use subfield $3 for data that may contain angle brackets. A URI would not be used in these subfields but a careful analysis is needed to determine where the URI would be needed.
URIs that identify headings -- which are often composed of several subfields -- would be recorded in subfield $0 (Authority Record Control Number for headings in Authority records) or subfield $w (Bibliographic Record Control Number for headings in Bibliographic records). Those subfields are already defined and used for (local) control number type identifiers.
For exchange of records the best practice would be to supply both the URIs and the headings in records. Internally systems may store only one or the other or both.
Note that some of the following examples have hypothetical URIs where final decisions have not yet been made as to what to use. The subfields in bold are ones for which URIs for the values are also in the field. (Two of the examples show the option of using also an exclamation point to separate the URI from the value.)
Heading and relator URIs
700 1# $aGalway, James. $0<http://lccn.loc.gov/n81042545>$4prf
< http://id.loc.gov/vocabulary/relators/prf> $4cnd
700 1# $aGalway, James. $0<http://lccn.loc.gov/n81042545>$4prf!
Heading and relator URIs
110 2# $aUniversity of Texas. $bDept. of Anthropology.
$0<http://lccn.loc.gov/n86041077> $4spn <http://id.loc.gov/vocabulary/relators/spn>
Subject Heading URI
650 #0 $aWorld Wide Web.
Source List code and organization code URIs
583 1# $awill transform digitally $c20031104 $iOCR $zqueued for digitization, Nov. 4,
2003 $2pda <http://id.loc.gov/vocabulary/sources/pda> $5NIC
Carrier type text, code, and code list URIs
338 ## $aaudiodisc<http://RDVocab.info/termList/RDACarrierType/1004>
338 ## $aaudio disc!<http://RDVocab.info/termList/RDACarrierType/1004>$bsd!
Language code URI
041 1# $aeng <http://id.loc.gov/vocabulary/languages/eng> $hger
HOME >> MARC Development >> Discussion Paper List
|The Library of Congress >> Especially
for Librarians and Archivists >> Standards
( 12/21/2010 )
|Legal | External Link Disclaimer||Contact Us|