The Library of Congress >> Especially for Librarians and Archivists >> Standards
HOME >> MARC Development >> Proposals List
DATE: December 20, 2019
NAME: Defining a New Indicator Value for Human-generated Content in Field 883 of the MARC 21 formats
SOURCE: German National Library, for the Committee on Data Formats
SUMMARY: This paper proposes a way that metadata provenance information can be extended in the MARC formats from fully or partially machine-generated metadata to any type of metadata, including intellectually assigned metadata. The approach outlined is the definition of a new value “2” for “Created by a human cataloger” as the first indicator of field 883 in all five MARC formats. The name of the field is to be changed from “Machine-generated Metadata Provenance” to a broader scope, and the name of the first indicator position is to be changed from “Method of machine assignment”, accordingly.
KEYWORDS: Field 883 (All formats); Machine-generated Metadata Provenance (All formats); Field 883, 1st indicator (All formats); Data provenance (All formats); Metadata provenance (All formats)
RELATED: 2018-DP05; 2017-DP05; 2012-03
12/20/19 – Made available to the MARC community for discussion.
01/25/20 – Results of MARC Advisory Committee discussion: Approved, with the following editorial amendments:
04/28/20 - Results of MARC Steering Group review - Agreed with the MAC decision.
Over the last few years, the German National Library has developed different methods of assigning metadata to the description of a resource by means of automated processes. Software has been fed with authority data, and trained to analyze an electronic publication, so that language information, classification numbers and subject headings can be assigned. In addition, different versions of the same content sometimes have records which are able to “learn from each other”, by matching the records and copying fields, – e.g. classification numbers, subject headings, genre/form and intended audience information – from the record of the printed publication to the record of the electronic version, and vice versa. This always happens in combination with provenance data, i.e. the name of the process creating the data, a confidence value, and a date of assignment.
For quality control reasons, additional fields are sometimes assigned manually, including the data provenance information that the field has been added by a cataloger. Mapping and converting the records from the internal format to MARC 21 then has limitations: while MARC is able to transport metadata provenance information about fully machine-generated or partially machine-generated metadata, it lacks the ability to provide provenance information when a MARC field has been created by a human cataloger.
The German National Library is planning to change the mapping and conversion in this important area, and to adjust the internal logic and its MARC representation as soon as a MARC format solution has been developed and documented. We intend to take significant steps in 2020.
Field 883 is defined the MARC formats as follows:
883 - Machine-generated Metadata Provenance (R)
First Indicator: Method of machine assignment
# - No information provided/not applicable
0 - Fully machine-generated
Data in linked field was fully machine-generated.
1 - Partially machine-generated
Data in linked field was partially machine-generated.
Second Indicator: Undefined
# - Undefined
$a - Generation process (NR)
$c - Confidence value (NR)
$d - Generation date (NR)
$q - Generation agency (NR)
$x - Validity end date (NR)
$u - Uniform Resource Identifier (NR)
$w - Bibliographic record control number (R)
$0 - Authority record control number or standard number (R)
$1 - Real World Object URI (R)
$8 - Field link and sequence number (R)
Field Definition and Scope
Used to provide information about the provenance of metadata in data fields in the record, with special provision for machine generation. Field 883 contains a link to the field to which it pertains. Intended for use with data fields that have been fully or partially machine-generated, i.e., generated by some named process other than intellectual creation.
Field 883 was introduced into the MARC formats in 2012 as a result of Proposal 2012-03. The field was initially defined for Dewey Decimal Classification data, but it is flexible enough to cover basically every variable MARC field. In 2012, there was the estimation that only metadata provenance information about fully or at least partially machine-generated metadata was needed, a limitation that now is to be discussed in light of more recent developments.
In 2017 and 2018 two MARC discussion papers (2017-DP05 and 2018-DP05) approached the issue of metadata provenance from a different angle, focusing on subject headings and their institution level information. After intense discussions by the MARC Advisory Committee, there was no support for a subfield $5 option, but some preference for a field 883 option: "To code local subject information, the inclusion of a new first indicator value of "2" with label “Not machine generated” would avoid problems of backwards compatibility. DNB will consider submitting a proposal focusing upon the option to broaden the existing scope of field 883." (quoted from the results of 2018-DP05).
The approach taken here is to define a new value, e.g. "2", for "Created by a human cataloger" as the first indicator of field 883.
We propose adding a new first indicator value:
2 – Created by a human cataloger
Data in linked field was created by a human cataloger, and is not machine-generated (neither fully nor in part).
The name of the first indicator will have to be changed from “Method of machine assignment” to “Method of assignment”. The name of the field 883 itself will have to be changed from “Machine-generated Metadata Provenance” to just “Metadata Provenance”.
The Field Definition and Scope section will have to be adjusted accordingly:
"Used to provide information about the provenance of metadata in data fields in the record,
with special provision for machine generation. Field 883 contains a link to the field to which it pertains. Intended for use with data fields that have been fully or partially machine-generated, i.e., generated by some named process other than intellectual creation,or created by a human cataloger."
The documentation of subfield $8 (Field link and sequence number) and its field link type "p" does not have to be changed. It reads:
p - Metadata provenance
Used in a record to link a field with another field containing information about provenance of the metadata recorded in the linked field.
There is some possible overlap between the immediate need and approach as outlined in this paper on the one hand, and on the other hand the concept of "Data Provenance" in Resource Description and Access (RDA) and its potential equivalent(s) in the MARC formats. A “LC-PCC Task Group on Data Provenance in Beta RDA Toolkit” has been created to analyze the instructions in Beta RDA Toolkit regarding data provenance, and to prepare further steps. The new "MARC/RDA Working Group" will work on accommodating RDA and developing MARC solutions, with “Data provenance” as one of the main issues. However, it seems unlikely that the schedule of the MARC/RDA Working Group will allow a short-term solution that the German National Library needs. So instead the suggestion here is to take a first step, and then further build upon the accomplished solution to accommodate the full range of data provenance.
MARC Bibliographic record, cf. http://d-nb.info/1194578314
LDR 03039nam a2200541uc 4500
008 190912s2018 gw |||||o|||| 00||||eng
016 7# $2DE-101$a1194578314
024 7# $2urn$aurn:nbn:de:101:1-2019091212130818131036
040 ## $a1240$bger$cDE-101$d1250
041 ## $81\p$aeng
044 ## $cXA-DE-HH
082 74 $82\p$a658.408$qDE-101$223kdnb
083 7# $83\p$a650$a355$qDE-101$223sdnb
100 1# $aKernchen, Roman$eVerfasser$4aut
245 10 $aCorporate Environmental Responsibility in the Defence Industry: A Driver for Green Innovation?$cRoman Kernchen ; Eyvor Institut
264 #1 $aHamburg$bEyvor Institut$c2018
300 ## $aOnline-Ressource
490 1# $aSchriftenreihe des Eyvor Instituts$vMT-1
583 1# $aArchivierung/Langzeitarchivierung gewährleistet$5DE-101$2pdager
650 #7 $0(DE-588)4115806-4$0https://d-nb.info/gnd/4115806-4$0(DE-101)041158067$aRüstungsindustrie$2gnd
650 #7 $0(DE-588)7697760-2$0https://d-nb.info/gnd/7697760-2$0(DE-101)1001443063$aCorporate Social Responsibility$2gnd
650 #7 $0(DE-588)4201709-9$0https://d-nb.info/gnd/4201709-9$0(DE-101)042017092$aUmweltbezogenes Management$2gnd
650 #7 $0(DE-588)4326464-5$0https://d-nb.info/gnd/4326464-5$0(DE-101)043264646$aNachhaltigkeit$2gnd
710 2# $aEyvor Institut
830 #0 $aSchriftenreihe des Eyvor Instituts$vMT-1
850 ## $aDE-101a$aDE-101b
856 40 $uhttps://nbn-resolving.org/urn:nbn:de:101:1-2019091212130818131036$xResolving-System
856 #0 $uhttps://d-nb.info/1194578314/34$xLangzeitarchivierung Nationalbibliothek
856 4# $qapplication/pdf$uhttps://eyvor.org/corporate-environmental-responsibility-in-the-defence-industry-a-driver-for-green-innovation/$xVerlag$zkostenfrei
883 0# $81\p$amaschinell gebildet$c1,00000$d20190913$qDE-101
883 0# $82\p$amaschinell gebildet$c0,99885$d20190913$qDE-101
883 2# $83\p$aintellektuell gebildet$d20190919$qDE-101
This information would be included in the bf:generationProcess property found in the AdminMetadata of a description.
Make the following changes to field 883 (Machine generated-Metadata Provenance) in all five MARC 21 formats (see section 2.2. above for full description):
HOME >> MARC Development >> Proposals List
|The Library of Congress >> Especially for Librarians and Archivists >> Standards
|Legal | External Link Disclaimer||Contact Us|