The Library of Congress >> Especially for Librarians and Archivists >> Standards

MARC Standards

HOME >> MARC Development >> Discussion Paper List


MARC DISCUSSION PAPER NO. 2018-DP08

DATE: May 25, 2018
REVISED:

NAME: Use of Field 024 to Capture URIs in the MARC 21 Authority Format

SOURCE: PCC Task Group on URIs in MARC

SUMMARY: This paper discusses the need to capture Uniform Resource Identifiers (URIs) in field 024 (Other Standard Identifier) of the MARC 21 Authority Format in a manner that clearly differentiates between:

  1. URIs that identify a "Record" or "Authority" entity describing a Thing (e.g. madsrdf:Authorities, SKOS Concepts for terms in controlled or standard vocabulary lists) and,
  2. URIs that directly identify a Thing itself (sometimes referred to as a Real World Object or RWO, whether actual or conceptual).

The paper further considers differentiating MARC subfields for URIs that are alphanumeric standard numbers or codes already accommodated in 024 $a and dereferenceable HTTP URIs promoting the conversion of MARC data to linked data format.

Note: Standard vocabulary terms from controlled lists, such as MARC lists, are not generally considered Authority "records"; however, when those terms are represented as SKOS concepts and assigned actionable/dereferenceable URIs, they do carry with them "record" like data in a particular vocabulary scheme.  The latter are referenced in this paper as Authority "records" in conjunction with more traditional Authorities in a record format.

KEYWORDS: Field 024 (AD); Other Standard Identifer (AD); Uniform Resource Identifier (AD); URIs

RELATED: 2017-08; 2017-DP01

STATUS/COMMENTS:
05/25/18 – Made available to the MARC community for discussion.

06/24/18 – Results of MARC Advisory Committee discussion: MAC, in agreement with PCC, preferred the definition of field 024 $0/$1 (Option 2) over the other alternatives explored in the paper. The discussion paper will return as a proposal, focusing on Option 2.


Discussion Paper No. 2018-DP08: Use of Field 024 to Capture URIs

1. BACKGROUND

In approving MARC Proposal 2017-08, MAC established a distinction, reflected in the definitions of $0 and $1, between identifiers for authorities and URIs for RWOs. The PCC URI Task Group has identified one further place where it is necessary to make that distinction, namely in the 024 field in the Authority format.

The current definition of field 024 in the MARC Authority format is as follows:

Field Definition and Scope: Standard number or code associated with the entity named in the 1XX field which cannot be accommodated in another field (e.g., fields 020 (International Standard Book Number) and 022 (International Standard Serial Number)). The source of the standard number or code is identified in subfield $2 (Source of number or code).
Subfields in this field are defined for consistency with field 024 in the MARC 21 Format for Bibliographic Data.

Experiments by the PCC URI Task Group and others in converting MARC 21 to linked data suggest that there are major benefits to storing URIs in MARC 21. That said, the Resource Description Framework (RDF), the recommended encoding for linked data, requires more semantic precision than the 024 field in MARC 21 currently contains. This paper argues that changes to the 024 field in the Authority format (similar to the recent changes to the subfield $0 and addition of the $1 [see 2017-DP01, 2017-08]) are an important prerequisite for the conversion to linked data.

A scope note: The Uniform Resource Locator (or URL) is another important type of URI, which provides addresses for human-readable websites, documents, or web pages.  But since the focus of this paper is linked data designed for machine consumption, document URLs are out of scope. URLs and the use of $u (described above) to capture them are not part of the proposal.

2. DISCUSSION

2.1. URIs and the Semantic Web

According to linked data design principles [COOL URIs, https://www.w3.org/TR/cooluris/], the semantic web infrastructure relies on the unique identification of entities—or, in semantic web terms, "Real World Objects" (RWOs), or, even more colloquially, "Things." For example, a Person and a MARC 21 Authority record about the person are different RWOs, or Things, and each needs to be uniquely identified with distinct URIs for semantic clarity.

RDF statements about a living person may include lifespan dates or a home address, which would be accessible from a URI that functions somewhat like a Social Security number. But an authority record is fundamentally different because it is an information object that may contain a description of a person, as well as a revision history and other facts about the record itself. Although this difference may seem pedantic, it is important for making precise statements about library resources. When we state in a machine-understandable form that “William Shakespeare is the author of Hamlet,” we want to ensure that the reference is to the person who lived from 1564 to 1616, and not to an authority record or similar document. In short, a person can be an author, but a record cannot.

The following diagram illustrates the semantic differences between Records (Authorities or skos:Concepts) about Things/RWOs and Things/RWOs themselves.

The minting of multiple URIs for the same or similar things is inevitable. To help align these duplicative entities, there are established semantic web conventions for mappings between the same or similar entities. The decision regarding which relationships to use to map different entities depends on the type of entities being mapped and the degree of confidence with respect to their equivalencies. The following table describes select properties used to link different URIs for the same or similar entities.

Select Mapping Property

Description

Notes

owl:sameAs

“...indicates that two URI references actually refer to the same thing”

Because the two things are exactly the same thing, everything said about one entity is true about the other. This property needs to be used with care to avoid unintended statements caused by reasoning on the data. It can lead to messy data if the two things are not in fact the same.

rdfs:seeAlso

“...used to indicate a resource that might provide additional information about the subject resource… When such representations may be retrieved, no constraints are placed on the format of those representations.”

A very loose mapping property. This property might be used to link to any type of entity remotely close to the subject entity.

skos:exactMatch

“...is used to link two concepts, indicating a high degree of confidence that the concepts can be used interchangeably across a wide range of information retrieval applications.”

Limited to linking skos:Concepts, with higher degree of precision than skos:closeMatch. This property needs to be used with care to avoid unintended statements caused by reasoning on the data.

skos:closeMatch

“...is used to link two concepts that are sufficiently similar that they can be used interchangeably in some information retrieval applications.”

Limited to linking skos:Concepts, with lower degree of precision than skos:exactMatch.

schema:sameAs

“URL of a reference Web page that unambiguously indicates the item's identity. E.g. the URL of the item's Wikipedia page, Wikidata entry, or official website.”

Similar to rdfs:seeAlso, a loose mapping property that is not reserved for URIs for exactly the same thing.

foaf:focus

“The underlying or 'focal' entity associated with some SKOS-described concept.”

Specifically links a skos:Concept to an entity (RWO) that the Concept/Authority describes.

In RDF, if you say that two entities are the same as each other using the common RDF property owl:sameAs, then everything stated about one entity is also true of the other. This can lead to messy data if the two things are not in fact the same. For instance, two authority records from different national authority files describing the same person are not the same resource. Each authority record has unique traits: different dates of creation and/or of modification, different sources of information, different processes asserted on them, etc. Therefore, rather than asserting that the two authority records are owl:sameAs, we want to assert that the focus of each authority record is the same Person, which is identified by the URI for the Person/RWO. URIs that directly identify a Person provide a bridge between different authority records focusing on the same Person. More directly, the two authorities could be related using skos:exactMatch or skos:closeMatch depending on the circumstance, or even looser properties rdfs:seeAlso or schema:sameAs. The foaf:focus property is designed to align SKOS terms to their RWO equivalents.

2.2. Current Use of 024 in MARC 21 and Conversion to RDF

As described in MARC Discussion Paper No. 2017-DP01 and MARC Proposal No. 2017-08, in the MARC 21 format it is critical to distinguish RWO URIs from URIs for Authorities and skos:Concepts to allow for meaningful conversions other formats, such as RDF.

The Authority 024 is currently defined as, “Standard number or code associated with the entity named in the 1XX field which cannot be accommodated in another field …” It allows for capturing any external identifiers (both URIs and non-URI identifiers) for the thing described in the record not provisioned for elsewhere in the MARC Authority record. The 024 in the MARC 21 Authority format, like the definition of the $0 prior to the Proposal 2017-08, lacks a machine actionable way to make meaningful mapping assertions from various datasets when converting MARC Authority Records to RDF. In order to accomplish this, we need the 024 to allow disambiguation among standard numbers and codes that are not machine actionable, URIs for machine actionable Authority/Concepts, and RWO URIs.

2.3. Proposed Options for Adding Dereferenceable URIs to MARC Authority 024


OPTION 1
Define the 024 second indicator in the Authority format to denote whether a URI in the corresponding subfield $a refers to an Authority describing a Thing, or directly refers to the Thing. Paralleling the $0 and $1 throughout the MARC format, a second indicator value 0 could denote a URI for an Authority, and a second indicator value 1 could denote a URI for a Thing.  In this option, the value in $a could be an alphanumeric standard number or code or could be a machine actionable/dereferenceable URI.

The "blank" second indicator value for no information provided should be retained to prevent invalidating existing data. Since the second indicator would be defined as a URI indicator, the value blank would need to be redefined as "no URI present or no information provided on type of URI".

OPTION 2
Retain the current definition of 024 $a for non-URI standard numbers or codes and define $0 and $1 using definitions similar to those given in Proposal 2017-08 (and now reflected in the MARC Authority control subfield definitions in Appendix A), but restricted to URIs providing machine-actionable or parseable data:  

It is worth noting that there are already fields currently in the MARC Authority format (043, 052), Community format (043) and Classification format (043) with $a for a code, $0 (Authority record control number or standard number), and $1 (Real World Object URI).

This option would require URIs currently captured in the $a to be moved to the $0 or $1.

OPTION 3
Deprecate the 024 $a and replace it with subfield $0, and add $1 (using the definitions for $0 and $1 described in MARC Authority Appendix A).

This option simplifies the $a/$0 distinction by combining them, and it would allow for subfield synchronization between the Authority and Bibliographic formats, but it would require legacy data to be moved from the $a to $0, with a parenthetical code prefix or a $2 included. However, similar to other areas of MARC where the $0 is defined to hold both non-URI and URI identifiers, it complicates RDF conversion programs which will have to test $0 to determine whether it holds a machine-actionable URI or an alphanumeric standard number or code that is not dereferenceable.

OPTION 4
Redefine the 024 $a with the $0 definition and add $1 as defined in MARC Authority Appendix A.

This option simplifies the $a/$0 distinction by combining them, but would require programs to know to treat the 024 $a similar to the subfield $0 elsewhere in the MARC 21 format, with a parenthetical code prefix or a $2 included. It would also complicate RDF conversion programs which will have to test $0 to determine whether it holds a machine-actionable URI or an alphanumeric standard number or code that is not dereferenceable.

Note on all options: Authority URIs gathered in the 024 should be an exact match of the base MARC Authority, and RWO URIs are recommended to be the primary focus of the base MARC Authority. Such scoping of URIs in the 024 negates any need to introduce $4 in the MARC Authority 024. It allows $0 URIs to reliably be considered a skos:exactMatch to the base Authority, and 024 $1 URIs to reliably be considered the foaf:focus of the base Authority.

Note on scope: The reasons that this paper limits its scope to the Authority format are:

  1. While the MARC Authority 024 scope note indicates that “subfields are defined for consistency” with the 024 field in MARC Bibliographic, the two fields do not have identical definitions:
    1. MARC Authority 024: “Standard number or code associated with the entity named in the 1XX field which cannot be accommodated in another field (e.g., fields 020 (International Standard Book Number) and 022 (International Standard Serial Number)).”
    2. MARC Bibliographic 024: “Standard number or code published on an item which cannot be accommodated in another field (e.g., field 020 (International Standard Book Number), 022 (International Standard Serial Number) , and 027 (Standard Technical Report Number)).”

    These differences are sufficient not to synchronize this proposal for the Authority 024 with other MARC format 024 fields. There is a vast difference between a URI that appears on a resource and a URI associated with the entity in the Authority 1XX field.

  1. What Bibliographic record describes (and therefore what the 024 URI identifies) can be ambiguous because it captures Work, Publication, Item, Agent, etc. information, while the Authority record clearly describes just the entity named in the 1XX field  It is clear that these two 024 fields are not same and do not have the same focus.

If there is agreement that the Authority format and Bibliographic format should diverge with respect to the 024, the Authority definition will need to have the following language removed, "Subfields in this field are defined for consistency with field 024 in the MARC 21 Format for Bibliographic Data."

2.4. Definition of Subfield $2

With the expansion of 024 to include URIs under any of the options outlined in 2.3., it may be advisable to make a corresponding change in the definition of $2. $2 is currently defined as follows:

$2 - Source of number or code
MARC code that identifies the source of the number or code. Used only when the first indicator contains value 7 (Source specified in subfield $2). Code from: Standard Identifier Source Codes.

This definition can be understood as referring to the value recorded in 024 $a. If one or more additional subfields are defined to contain a URI identifying an authority or a record, or alternatively a RWO, then the source should be understood as referring to the subfield containing the URI. Accordingly, it may be appropriate to amend the definition as follows:

$2 - Source
MARC code that identifies the source of the number, code, or URI. Used only when the first indicator contains value 7 (Source specified in subfield $2). Code from: Standard Identifier Source Codes.

This change, if adopted, would require a change in the source code assigned to URIs. In the existing MARC source code lists, the source code given to URIs, irrespective of vocabulary, is “uri”. This has the anomalous result that an ISNI given as a string value (for example) has a different source code from the same ISNI given as a URI. The use of (uri) as a prefix identifying the source of URIs in $0 was discontinued with the approval of MARC Discussion Paper 2016-DP18. If a similar principle is adopted for $2, then $2 would be available to designate the actual source vocabulary in use. Examples are given as indicated below.

3. EXAMPLES

3.1. Option 1

Pattern:
024 [ind1]0 $a Authority URI
024 [ind1]1 $a RWO/Thing URI
024 [ind1]# $a no URI present or no information provided on type of URI

Examples:

1a) 024 70 $a http://viaf.org/viaf/145862829 $2 uri
[dereferenceable HTTP URI for a madsrdf:Authority/skos:Concept, source identified using ‘uri’]

1b) 024 70 $a http://viaf.org/viaf/145862829 $2 viaf
[dereferenceable HTTP URI for a madsrdf:Authority/skos:Concept, source identified using ‘viaf’ (the vocabulary source code)]

2) 024 80 $a http://www.gamemetadata.org/media/1059
[dereferenceable HTTP URI for an Authority/skos:Concept, no source identified]

3) 024 71 $a http://www.wikidata.org/entity/Q611833 $2 uri
[dereferenceable HTTP RWO/Thing URI, source identified, using ‘uri’]

4) 024 7# $a 8462832856536435 $2 isni
[alphanumeric standard number, source identified]

3.2. Option 2

Pattern:
024 [indicators] $a non-URI standard number or code $0 Authority URI $1 RWO/Thing URI

Examples:

1) 024 7# $a 85270357 $2 viaf
[alphanumeric standard number, source identified]

2) 024 7# $a 85270357 $1 http://viaf.org/viaf/85270357 $2 viaf
[alphanumeric standard number and dereferenceble HTTP RWO/Thing URI both given, source identified]

3) 024 8# $0 http://id.loc.gov/authorities/names/n85387872
[dereferenceable HTTP URI for an Authority, no source identified]

4a) 024 7# $0 http://vocab.getty.edu/ulan/500082105 $2 uri
[dereferenceable HTTP URI for a madsrdf:Authority/skos:Concept, source identified using ‘uri’]

4b) 024 7# $0 http://vocab.getty.edu/ulan/500082105 $2 gettyulan
[dereferenceable HTTP URI for a madsrdf:Authority/skos:Concept, source identified using ‘gettyulan’ (the vocabulary source code)]

5) 024 7# $1 http://www.wikidata.org/entity/Q611833 $2 uri
[dereferenceable HTTP RWO/Thing URI, source identified, using ‘uri’]

3.3. Option 3

Pattern:
024 [indicators] $0 Standard number, code, or Authority URI $1 RWO/Thing URI

Examples:

1) 024 7# $0(saam)3597
[alphanumeric standard number/code for an Authority with parenthetical code prefix]

2) 024 7# $0 http://vocab.getty.edu/tgn/2069433 $2 gettytgn
[dereferenceable HTTP URI for a madsrdf:Authority/skos:Concept, source identified using ‘gettyulan’ (the vocabulary source code)]

3) 024 8# $1 http://dbpedia.org/resource/Mario_Bellini
[dereferenceable HTTP RWO/Thing URI, no source identified]

3.4. Option 4

Pattern:
024 [indicators] $a Standard number, code, or Authority URI $1 URI for an RWO/Thing

Examples:

1) 024 7# $a 3597 $2 saam
[alphanumeric standard number/code for an Authority, source identified]

2a) 024 7# $a http://vocab.getty.edu/tgn/2069433 $2 uri
[dereferenceable HTTP URI for a madsrdf:Authority/skos:Concept, source identified using ‘uri’]

2b) 024 7# $a http://vocab.getty.edu/tgn/2069433 $2 gettyulan
[dereferenceable HTTP URI for a madsrdf:Authority/skos:Concept, source identified using ‘gettyulan’ (the vocabulary source code)]

3) 024 8# $1http://dbpedia.org/resource/Mario_Bellini
[dereferenceable HTTP RWO/Thing URI, no source identified]

4. BIBFRAME DISCUSSION

The recommendations proposed in this document will facilitate the conversion of MARC records to BIBFRAME. The distinction between Thing and Authority URIs is consistent with the BIBFRAME 2.0 model.

5. QUESTIONS FOR DISCUSSION

5.1. Is there agreement for the need to make a distinction between Things (RWOs) and Authorities in the 024 field of the MARC Authority Format (similar to the $0/$1 distinction made in the already approved MARC Proposal No. 2017-08)?

5.2. Of the options laid out in the discussion paper, is there a preference? Is there another option not included in the discussion paper that should be considered?

5.3. If an option is chosen that would require values from the $a to be moved, what strategies would need to be taken to migrate related values?

We don't want the success of this proposal to have a chilling effect on the provision of data in $0/$1 or the wholesale deletion of existing data "because it might be wrong". Tools will need to be created to validate URIs to make sure they are in their proper place.

5.4. Should the definition of $2 be amended to include the source vocabulary for an identifier regardless of whether it is expressed as a string value or a URI?


HOME >> MARC Development >> Discussion Paper List

The Library of Congress >> Especially for Librarians and Archivists >> Standards
(08/17/2018)
Legal | External Link Disclaimer Contact Us