DISCUSSION PAPER NO. 88

DATE: May 26, 1995
REVISED:

NAME: Defining a Generic Author Field in USMARC

SOURCE: OCLC/NCSA Metadata Workshop

SUMMARY: This paper discusses the options for recording author names in USMARC records that do not use standard cataloging rules. The OCLC/NCSA Metadata Workshop held in Dublin, Ohio in March established a list of core data elements needed for discovery and retrieval of Internet resources ("metadata"). These included "Author" and "Other Agent", which does not distinguish between personal and corporate names. This paper suggests three options for mapping these to USMARC: 1) choose a single already established field to be used for generic author despite formal definitions; 2) relax the definition of 700-711 fields to include a value in the first indicator for "unknown or not specified"; or, 3) define a new, repeatable USMARC field for names of authors not formulated according to cataloging rules.

RELATED: DP86 (June 1995)

KEYWORDS: OCLC/NCSA Metadata Workshop; Dublin core data elements; Author

STATUS/COMMENTS:

5/26/95 - Forwarded to USMARC Advisory Group for discussion at the MARBI meetings.

6/26/95 - Results of USMARC Advisory Group discussion - Discussion indicated that there was interest in a proposal for a generic name field with the following characteristics:


DISCUSSION PAPER NO. 88:  Defining a Generic Author Field

1.  INTRODUCTION

     According to the General Introduction to the USMARC Concise
Formats, the USMARC record is composed of three elements: record
structure, content designation, and data content.   Content
designation for bibliographic records is defined in the USMARC
Format for Bibliographic Data.   The data content itself is usually
proscribed by standards outside the USMARC formats, including ISBD,
AACR2, and various thesauri.   For example, USMARC content
designation accommodates a topical Library of Congress subject
heading in a field tagged 650 with a second indicator of value "0". 
 The external document Library of Congress Subject Headings (LCSH),
however, provides the authority for the data content by defining
valid and well-formed topical LC subject headings.

     The USMARC bibliographic format was designed primarily to
support library cataloging, and in particular to support the
Anglo-American Cataloging Rules.  Therefore there are many data
elements defined in USMARC that relate specifically to particular
cataloging constructs.  It is generally accepted that when USMARC
is being used to represent a cataloging record, the cataloging
rules should govern the data content whenever applicable.  When a
USMARC record is created for some purpose unrelated to cataloging,
fields can be used for data congruent with the field definition,
even if the data content is not formulated according to the
cataloging rules.

     For names of authors and other agents responsible for all or
part of the intellectual content of the work, there is a
particularly close relationship between the content designation
defined in USMARC and the cataloging rules.  It often surprises
non-librarians to discover there is no MARC field defined
specifically for author, but rather sets of fields defined for main
and added entries, concepts that exist only within certain
cataloging codes and which also encompass a number of non-authorial
relationships.   Such integral support for cataloging is clearly a
desirable feature in the bibliographic formats, but it also raises
problems when creating bibliographic data for other purposes. For
one thing, since the 1XX and 7XX tag ranges are defined explicitly
in terms of the cataloging concepts main and added entry, it is
difficult to use them in an environment that lacks these concepts.

     A second problem with the 1XX and 7XX content designation in
USMARC is that to properly encode it, one has to know quite a large
number of things, including the author's relation to the work,
whether the name in question is that of a person, corporate body,
or meeting, and, to correctly set the first indicator,  the form of
entry of element of the name.   There is no option in USMARC to
choose not to supply any of this information or to indicate that it
is unknown.  This can pose a barrier to use of these fields for
non-cataloging purposes.

     While storing and communicating library cataloging data is
without doubt the predominant use of the USMARC bibliographic
formats,  there are other uses commonly made of the formats which
certainly are to the advantage of the library community.  To name
only a few:

     - A bibliographic record might be used in a library
     acquisitions system for the purpose of creating a purchase
     order.  In this case, the data is not cataloging data, and the
     acquisitions clerk creating the data may neither know the
     rules governing form and choice of entry nor have sufficient
     information in his citation to assign content designation
     congruent with the cataloging rules.

     - Bibliographic records might be created for the purpose of
     generating a set of references (endnotes or footnotes)
     according to some external authority such as the Chicago
     Manual of Style, which has quite different rules for citing
     authors' names than the cataloging standard.   For example,
     the seminar papers published in 1974 under the title Networks
     for research and education: Sharing of computer and
     information resources nationwide had four editors. According
     to the Chicago Manual, the names of all four editors should be
     listed before the title.  According to AACR2, there would be
     no main entry and the name of the first editor only would be
     recorded as an added entry.

     - An increasingly common use of USMARC bibliographic records
     is as a vehicle for metadata created by various communities
     according to various other standards.  For example, the
     Government Information Locator Service (GILS) defines a set of
     GILS Core Elements and specifies that these must be
     represented in three different record syntaxes, one of them
     USMARC.  More recently another standard known as the Dublin
     Core Element set was proposed for describing network
     accessible electronic resources.  The Dublin Core, with only
     minor variations, is also being incorporated into the emerging
     IETF standard for Uniform Resource Characteristic (URC).  

     This last use is gaining importance in the networked
environment where libraries are only one player in an increasingly
complex system of information creators, publishers and
disseminators. There is clearly great utility in being able to
represent metadata created according to standards other than AACR
and AACR2 into USMARC.  The data, which is inherently bibliographic
in nature, can then be edited and manipulated by the many existing
software packages for processing MARC bibliographic records, and 
the records can be integrated into existing library catalogs and
searched byMARC-based bibliographic retrieval systems.  Both the
library community and the information providers are benefited.

     When it is possible to map other metadata element schemes into
MARC content designation, in general the most problematic element
is the author.  The GILS Core Element set has no element for author
in the sense of AACR2, but does have an element for originator
which identifies the originator of the information resource.  This
is by convention mapped to the USMARC 710 field. The x10 was chosen
because GILS originators can be assumed to be government agencies,
and the 7XX block was chosen over the 1XX block because of its
repeatability.

     The Dublin Core, which is more fully described in Discussion
Paper No. 86 contains two data elements for names of entities
responsible for intellectual content, Author and Other Agent, which
are not necessarily governed by cataloging rules and which map only
imperfectly to USMARC 1XX and 7XX fields.  In its simplest form,
the Author element could be recorded simply as:

     AUTHOR = Miller, Bruce

with no indication of the relationship of the author to the work. 
A cataloger converting this data to USMARC is likely to be able to
infer that this is a personal name, but may be less likely to know
whether Mr. Miller is related to the resource in the capacity of
main or added entry.  

     The Dublin Core does allow for qualifiers which, if
extensively used, could provide enough information about a name for
accurate human or even machine mapping to USMARC, assuming the name
was formulated according to AACR.  The "scheme" qualifier can be
used to specify the cataloging ruleset, the "type" qualifier can
specify personal, corporate or other authorship, etc.

     AUTHOR (scheme = AACR2, role = Main Entry, type = Personal,
     form = Single
           Surname) = Miller, Bruce

However, since the Dublin Core was defined expressly for the
purpose of encouraging metadata creation by non-catalogers, the
likelihood of this information being supplied for the majority of 
objects is low.


2.  SOLUTIONS

     A possible solution, and the one in common practice now, is
simply to ignore the formal definition of the 1XX and 7XX fields
and choose a single field to be used for authors generically. 
Usually a 700 or 710 is chosen, either to avoid the implication of
main entry or because these fields are legally repeatable.  The
advantages are that this can be easily done, no change to USMARC is
required, and these fields are generally treated appropriately by
relevant software programs.  There are also disadvantages: it is
formally a non-appropriate use of the data element, the mapping may
actually be incorrect, and systems are unable to distinguish data
content properly and improperly represented.  A significant problem
is that the 1XX and 7XX fields have a close relationship to name
fields in the authorities format, and many local systems are
designed to require some form of linked or unlinked authority
control on name headings, making extensive use of non-standard and
non-authority controlled data in these fields problematic.

     A second option would be to change the USMARC bibliographic
formats to relax the definition of the 700-711 tag range.  Rather
than being described as added entry fields, they could be redefined
as appropriate for names not known to be main entries.  A value for
"unknown or not specified" could be added to the first indicator
position (Type of ... name entry element) [See also DP85] and a
value for "generic name entry" added to the second indicator (Type
of added entry).  This option would still require that personal,
corporate and meeting names be distinguished from each other (which
might be done by inference as for GILS data or by format
recognition) with the concomitant disadvantage that this
distinction would often be subverted or erroneous.  

     A third option would be to define a new,  repeatable USMARC
field explicitly for names of authors not formulated according to
the cataloging rules or contained in an authority file or list. 
This has been done in the format for other types of access points:

     field 653 (Index Term -- Uncontrolled) which may contain
           subject terms that are topical, name, etc. and are not
           authority controlled by an authority file or list, and 
     field 740 (Added Entry -- Uncontrolled Related/Analytical
           Title) which serves the same function for title access
           points.

This "generic author" field would not distinguish between main and
added entry, and would not require the distinction between types of
authors.  Internal content designation would be optional and kept
to a minimum.

     A possible definition of a field for generic author would be:

     720 (R) Author

     Indicators
     First      Type of name
     #     Unknown or not specified
     1     Personal
     2     Corporate
     3     Meeting
     4     Other

     Second     Undefined; contains a blank

     Subfield codes
     $a    Name
     $b    Other information

     720   1#   $aBlacklock, Joseph


3.  QUESTIONS 

1.  No matter which option is used, integrating cataloging and
non-cataloging data in bibliographic systems raises problems for
indexing, display and retrieval.  Are we more likely to want to
integrate "generic" authors with standard name fields or to
segregate them in separate keyword or alphabetical indexes?  Which
option would have the least adverse effect on existing systems? 
Which would give most flexibility in treatment? 

2.  It is possible that we should think of a generic "agent" field
that would allow relationships other than author to be recorded. 
In this case the field name could be "720 (R) Name (or Name --
Uncontrolled)".  The subfield for "other information" could be used
for relator information, or a third subfield defined specifically
for this.

     720   1#   $aVonderohe, Robert$b1934-$eeditor

     720   2#   $aCAPCON Library Network$eauthor

3.  If we define a generic "author" or "agent" field, how much
content designation should be provided?  Would additional
subfielding be useful or impractical if its use was optional?  If,
for example, a Dublin Core element with qualifiers was being
mapped, where would qualifier data be recorded?

4.   In a generic "author" or "agent" field, is there any virtue to
using the second indicator position to indicate type of entry or
form of name?  It might be useful to record if a personal name in
the $a is in inverted form, direct order, or unknown order, so that
alphabetical indexes could be limited to only inverted forms of
names.


Go to:


Library of Congress
Library of Congress Help Desk (09/03/98)