PROPOSAL NO: 96-1

DATE: December 1, 1995
REVISED:

NAME: Changes to Field 856 (Electronic Location and Access) in the USMARC Formats

SOURCE: Library of Congress

SUMMARY: This paper proposes two changes to Field 856. The first suggests the addition of a first indicator (Access method) value 8 for Other, to be used when a Uniform Resource Locator (URL) is recorded in subfield $u. The second change is a redefinition of subfield $q (File transfer mode) to File format type. This would result in recording the type of file, or MIME type in subfield $q, instead of the current definition that requires recording "ASCII" or "binary" to indicate what mode of transfer is necessary.

KEYWORDS: Field 856 (Bibliographic/Holdings/Classification); Electronic Location and Access; Subfield $q, in field 856 [Bibliographic/Holdings/Classification]; File transfer mode; File format type; Access method

RELATED:

STATUS/COMMENTS:

12/1/95 - Forwarded to USMARC Advisory Group for discussion at the 1996 Midwinter MARBI meetings.

1/21/96 - Results of USMARC Advisory Group discussion - Rejected. Because of OCLC use of the data in 856$2 for display of access method in the INTERCAT catalog, participants felt that saving a few keystrokes was not worth any impact of such a change. As to the proposed change to subfield $q, concern was expressed that the need for file format to be explicit may be a temporary situation, and that in the future files may become more self-defining. It was suggested that it would be better to wait and see if this change is still needed in the future, since no specific need has been demonstrated.

2/15/96 - Results of final LC review - Agreed with the MARBI decision.


PROPOSAL NO. 96-1: Changes to Field 856 (Electronic Location and
Access)

1.      BACKGROUND

        Field 856 (Electronic Location and Access) was initially
developed and approved by the USMARC Advisory Group in January
1993.  At that time, the Internet Engineering Task Force was
finalizing the draft standard for a locator, the Uniform Resource
Locator (URL).  During discussions of field 856, participants
agreed that the field should enable a system to create a "hot link"
to allow for the transfer of a file, the connection to another
host, or the initiation of an email message through information
recorded in the field.  

        Since the publication of field 856 in the USMARC Format for
Bibliographic Data and the increasing use of the field as records
for electronic resources have been created, users have gained more
experience in using the field.  This proposal considers two
possible changes to field 856.  The first is proposed as a result
of comments received over a period of time concerning redundancy in
the use of subfield $2 when the access method is recorded as part
of the URL in subfield $u.  The second is the redefinition of
subfield $q so that file format type can be given as part of the
information in the field.


2.      ACCESS METHOD
        The first indicator in field 856 was defined as Access method
and the values were defined to represent the three main TCPIP
protocols used on the Internet.  Other access methods were to be
recorded with a first indicator value of 7 to indicate that the
information is recorded in subfield $2 (Access method).  At the
time it was clear that other access methods were being developed,
and it was impossible to predict how the list might grow.  Because
the URL had not been fully developed or yet in wide use, separate
subfields were defined for recording all pieces of information that
were needed for a system to provide the appropriate link depending
upon the access method used.

        In January 1994 two proposals were presented to enhance field
856.  One of these, Proposal 94-3 (Addition of Subfield $u (Uniform
Resource Locator) to Field 856 in the USMARC Holdings/Bibliographic
Formats) defined a subfield $u for a URL.  The URL standard
stipulates that the URL begins with an access scheme, specified in
Uniform Resource Locators (URL) (RFC 1738), a product of the
Uniform Resource Identifiers Working Group of the IETF.  Now that
the URL has become a de-facto international standard for locating
resources on the Internet, many records are being created with only
a URL in subfield $u of field 856, rather than parsing the data
into separate subfields.  When a URL is used in the field, it is
currently necessary to set the first indicator to value 7 and fill
in subfield $2 with an indication of the access method that does
not have a specific value defined.  (This technique is common in
variable fields of the MARC formats, indicating that the
information that does not have its own value in the indicator can
be found in subfield $2.)  In the case of URL's, the first portion
before the "://" containing the access scheme is repeated in
subfield $2.  The information in subfield $2 is then essentially
redundant, since it is part of the URL itself. Catalogers using
MARC for description of Internet resources have suggested that this
redundancy results in additional unnecessary keying.

        An alternative would be to define indicator values for the
most commonly used access methods.  The advantage to this approach
would be the ability to use an indicator value for retrieval.  The
disadvantage is that there are only four (or three if 8 is defined
as Other) indicator values available.  Now defined are values 0, 1,
2, 3, 7; 4, 5, 6, and 8 would be available.  Value 9 is generally
used locally; value 8 is usually used as "Other".  Those access
methods that might need their own values are: news, http, gopher. 
If all the values are used, then it would not be possible to define
additional values when other access methods become available.

        If a value 8 for Other were defined in field 856 (another
common technique in the formats), it would not be necessary to
provide additional information in subfield $2.  Only the URL would
then need to be recorded in subfield $u and no other subfields
would be required.  If such a change were approved, the question
arises as to what to do with existing records.  Would they need to
be changed to set the first indicator to 8 and take out subfield
$2?  If the situation is a file available by FTP, and there is an
appropriate first indicator value (value 1), then the subfield $2
would not be filled in, but the information in the URL would
contain the initial "ftp".  In this case the first indicator would
supply the information about access method, even though the
information is already in the URL in subfield $u.  In other cases,
where there is no specific indicator value, the information would
not specified except in $u.  In cases where the separate subfields
have been used instead of a URL, the appropriate value would be
recorded in the first indicator with subfield $2 if the first
indicator value is value 7.  

        OCLC has established a searchable database of MARC records for
Internet resources called Intercat.  Participants in the project
contribute records and the database is available through the World
Wide Web.  It is important to note that OCLC is using subfield $2
for display of the access method.  After a search, a brief record
display includes "Electronic access:" and the data in subfield $2;
the display of the full record includes "Mode of access:[data in
$2]" and "Location: [data in $u]".  Consequently, this proposed
change, although requested by catalogers who do not want to key
redundant data, would have an impact on the OCLC Intercat catalog. 
It would be necessary for OCLC to change the program to extract the
first piece of information in the URL if the first indicator were
set to 8 to display it as the catalog currently does.


3.      FILE FORMAT TYPE
        When the field for electronic location was initially being
discussed, participants agreed that the information should include
whatever was needed for interaction with the resource to take
place.  If the resource described in the record was available by
telnet, the information should enable a connection; if the resource
was available by email, it should enable the initiation of an email
message; if by FTP, it should enable the transfer of a file.  One
piece of information that was deemed by participants to be required
for FTP was whether the file is transferred as ASCII or binary. 
Thus subfield $q was defined as File transfer mode.

        In the past few years, the availability of all types of
resources over the Internet has exploded.  Now, the World Wide Web,
which was only under development when enhancements were made to
MARC to accommodate description of Internet resources, has allowed
for the integration of multimedia resources.  Software that is
necessary for display of digitized images or playing of digital
audio files is activated depending upon the file format.  Often the
file extension indicates the type of file and determines whether it
is transferred in binary or ASCII mode (ASCII is the default; all
other types of files are transferred using binary).

        In creating MARC records for Internet resources, catalogers
have been confused about where to include information about file
format.  Field 516 (Type of file) is a note field containing
generally nature and scope information about the file described. 
In some cases this information has been combined in the field with
file format (e.g. "Electronic journal in ASCII format").  In other
cases, field 538 (System requirements note) has been used, since
requirements for processing the file are dependent upon the type of
compression used or file format type.  

        File format is a data element included in the Dublin Core, a
list of core data elements needed for Internet resource discovery
and retrieval.  This list was developed by a wide range of
participants at the OCLC Metadata Workshop held in March 1995.  In
the mapping of the elements to MARC, field 538 was used for this
element (see Discussion Paper No. 86: Mapping the Dublin Core
Metadata Elements to USMARC).  However, this mapping is not
entirely adequate, since the field can contain information other
than file format, and since file format has been also recorded in
other MARC fields.

        If a subfield were defined in field 856 for file format, then
the information could be given at the level of the location, rather
than for the intellectual work as a whole.  In recent discussions
of whether separate records need to be created for different file
formats, the majority of respondents have endorsed using one record
for the intellectual work and to use repeating 856 fields for
different file formats.  Recording such information within field
856 would allow for the file format to be associated with a
particular file at a particular host.  However, other note fields
would still be available for recording file format if this were
desirable.

        File format type is often referred to as "MIME type". Often
the extension to a filename indicates the file format.  An Internet
Request for Comments (RFC1521) "MIME (Multipurpose Internet Mail
Extensions)" specifies the type of data in mail messages, although
this is generally extended to other types of resources that reside
on the Internet.   It includes content types and subtypes and
defines a registration process that uses the Internet Assigned
Numbers Authority (IANA) as a central registry for specific values.

        If subfield $q were redefined as File format type, the
question arises as to how to record the data.  Would a standardized
list be maintained of file format types, or would the user use free
text?  If it were desirable to maintain a list, it should be
consistent with others established.  It may be necessary to use
those that have been registered by IANA, as specified in RFC1521.

        Attachment A contains Appendix F from RFC1521, the summary of
seven content-types as defined in the MIME standard.  Attachment B
contains a list of MIME types with a mapping to file extensions.

        Questions to consider:
        1. Has anyone used subfield $q extensively so that a
redefinition is not desirable?   Do users need a separate subfield
for the information about binary or ASCII transfer?  Only subfields
$e and $y are available in field 856 if a new subfield is needed
for file format type.  Or is it desirable to include binary or
ASCII with file format type in this newly defined subfield? 

        2. Should the data in the subfield be a controlled list or
free text? If a controlled list, what should be the authoritative
source?                         


4.      PROPOSED CHANGES

The following is presented for consideration:

        -       In the USMARC Holdings/Bibliographic Formats, define the
                following value in Field 856, First indicator:                                  
                8       Other

        -       In the USMARC Holdings/Bibliographic Formats, redefine
                subfield $q (File transfer mode) as File format type.
        See Attachment C for a description of this field if this
proposal is approved.
------------------------------------------------------------------
                                              ATTACHMENT A
RFC 1521                    MIME                    September 1993


Appendix F -- Summary of the Seven Content-types

   Content-type: text

   Subtypes defined by this document:  plain

   Important Parameters: charset

   Encoding notes: quoted-printable generally preferred if an
   encoding is needed and the character set is mostly an ASCII
   superset.

   Security considerations: Rich text formats such as TeX and Troff
   often contain mechanisms for executing arbitrary commands or file
   system operations, and should not be used automatically unless
   these security problems have been addressed.  Even plain text may
   contain control characters that can be used to exploit the
   capabilities of "intelligent" terminals and cause security
   violations.  User interfaces designed to run on such terminals
   should be aware of and try to prevent such problems.

   ________________________________________________________
   Content-type: multipart

   Subtypes defined by  this  document: mixed, alternative, digest,
parallel.

   Important Parameters: boundary

   Encoding notes: No content-transfer-encoding is permitted.

   ________________________________________________________
   Content-type: message

   Subtypes defined by this document: rfc822, partial, external-body

   Important Parameters: id, number, total, access-type, expiration,
   size, permission, name, site, directory, mode, server, subject

   Encoding notes: No content-transfer-encoding is permitted.
      Specifically, only "7bit" is permitted for "message/partial" or
      "message/external-body", and only "7bit", "8bit", or "binary"
      are permitted for other subtypes of "message".
   ______________________________________________________________
   Content-type: application

   Subtypes defined by this document:  octet-stream, postscript

   Important Parameters:  type, padding
------------------------------------------------------------------

RFC 1521                     MIME                    September 1993


   Deprecated Parameters: name and conversions were
                          defined in RFC 1341.

   Encoding notes: base64 preferred for unreadable subtypes.

   Security considerations:  This  type  is  intended  for  the
   transmission  of data to be interpreted by locally-installed
   programs.  If used,  for  example,  to  transmit  executable
   binary  programs  or programs in general-purpose interpreted
   languages, such as LISP programs or  shell  scripts,  severe
   security  problems  could  result.   Authors of mail-reading
   agents are cautioned against giving their systems the  power to 
   execute  mail-based  application  data without carefully
   considering  the  security  implications.  While it is certainly 
   possible  to  define safe application formats and even safe
   interpreters for unsafe formats, each  interpreter should be
   evaluated separately for possible  security problems.
   ________________________________________________________________
   Content-type: image

   Subtypes defined by this document:  jpeg, gif

   Important Parameters: none

   Encoding notes: base64 generally preferred
   ________________________________________________________________
   Content-type: audio

   Subtypes defined by this document:  basic

   Important Parameters: none

   Encoding notes: base64 generally preferred
   ________________________________________________________________
   Content-type: video

   Subtypes defined by this document:  mpeg

   Important Parameters: none

   Encoding notes: base64 generally preferred






Borenstein & Freed                                            
[Page 75]
------------------------------------------------------------------
                                              ATTACHMENT B
Mapping of MIME types to file extensions

MIME type                               File extension

application/activemessage
application/andrew-inset                     
application/applefile
application/atomicmail                         
application/dca-rft                            
application/dec-dx                             
application/mac-binhex40
application/macwriteii
application/msword
application/news-message-id                    
application/news-transmission                  
application/octet-stream                        bin             
application/oda                                 oda
application/pdf                                 pdf
application/postscript                          ai eps ps       
application/remote-printing                    
application/rtf                                 rtf             
application/slate                              
application/x-compressed                        Z
application/x-mif                               mif
application/wita                               
application/wordperfect5.1                      wp
application/x-csh                               csh             
application/x-dvi                               dvi             
application/x-hdf                               hdf             
application/x-latex                             latex           
application/x-netcdf                            nc cdf          
application/x-powerpoint                        ppt
application/x-sh                                sh              
application/x-tcl                               tcl             
application/x-tex                               tex             
application/x-texinfo                           texinfo texi   
application/x-troff                             t tr roff       
application/x-troff-man                         man             
application/x-troff-me                          me              
application/x-troff-ms                          ms              
application/x-wais-source                       src             
application/zip                                 zip             
application/x-bcpio                             bcpio           
application/x-cpio                              cpio            
application/x-gtar                              gtar            
application/x-shar                              shar            
application/x-sv4cpio                           sv4cpio         
application/x-sv4crc                            sv4crc          
application/x-tar                               tar             
application/x-ustar                             ustar           
audio/basic                                     au snd          
audio/x-aiff                                    aif aiff aifc
audio/x-wav                                     wav             
image/gif                                       gif             
image/ief                                       ief             
image/jpeg                                      jpeg jpg jpe jif
image/tiff                                      tiff tif        
image/x-cmu-raster                              ras
image/x-pcx                                     pcx
image/x-portable-anymap                         pnm             
image/x-portable-bitmap                         pbm             
image/x-portable-graymap                        pgm             
image/x-portable-pixmap                         ppm             
image/x-rgb                                     rgb
image/x-xbitmap                                 xbm             
image/x-xpixmap                                 xpm             
image/x-xwindowdump                             xwd             
message/external-body
message/news
message/partial
message/rfc822
multipart/alternative
multipart/appledouble
multipart/digest
multipart/mixed
multipart/parallel
text/html                                       html
text/plain                                      txt
text/richtext                                   rtx             
text/tab-separated-values                       tsv             
text/x-setex                                    etx             
text/x-sgml                                     sgml sgm
video/mpeg                                      mpeg mpg mpe    
video/quicktime                                 qt mov          
video/x-msvideo                                 avi             
video/x-sgi-movie                               movie           

Additional information on file types (these documents also indicate
whether the file is to be transferred as ASCII or binary):
"List of file extensions", by Allison Zhang  
URL: http://ac.dal.ca/~dong/contents.html
"Common Internet file formats", compiled by Eric Perlman and Ian
Kallen 
URL: http://www.matisse.net/files/formats.html

------------------------------------------------------------------
                                              ATTACHMENT C
<  > indicates addition; [  ] indicates deletion
856   Electronic Location and Access  (R)

Indicators

  First        Access method
    0            Email
    1            FTP
    2            Remote login (Telnet)
    3            Dial-up
    7            Method specified in subfield $2
    <8           Other>
  
  Second       Undefined
    #            Undefined

Subfield Codes

    $a     Host name  (R)
    $b     Access number  (NR)
    $c     Compression information  (R)
    $d     Path  (R)
    $f     Electronic name  (R)
    $g     Electronic namežEnd of range  (R)
    $h     Processor of request  (NR)
    $i     Instruction  (R)
    $j     Bits per second  (NR)
    $k     Password  (NR)
    $l     Logon/login  (NR)
    $m     Contact for access assistance  (R)
    $n     Name of location of host in
                subfield $a   (NR)
    $o     Operating system  (NR)
    $p     Port  (NR)
    $q     File <format type> [transfer mode]  (NR)
    $r     Settings  (NR)
    $s     File size  (R)
    $t     Terminal emulation  (R)
    $u     Uniform Resource Locator  (R)
    $v     Hours access method available  (R)
    $w     Record control number  (R)
    $x     Nonpublic note  (R)
    $z     Public note  (R)
    $2     Access method  (NR)
    $3     Materials specified  (NR)


FIELD DEFINITION AND SCOPE

       This field contains the information required to locate an
electronic item.  The information identifies the electronic
location containing the item or from which it is available.  It
also contains information to retrieve the item by the access method
identified in the first indicator position.  The information
contained in this field is sufficient to allow for the electronic
transfer of a file, subscription to an electronic journal, or logon
to an electronic resource.  In some cases, only unique data
elements are recorded which allow the user to access a locator
table on a remote host containing the remaining information needed
to access the item.

       Field 856 is repeated when the location data elements vary
(subfields $a, $b, $d) and when more than one access method may be
used.  It is also repeated whenever the electronic filename varies
(subfield $f), except when a single intellectual item is divided
into different parts for online storage or retrieval.

-------------------------------------------------------------------
-----------------------------------------------
GUIDELINES FOR APPLYING CONTENT DESIGNATORS


 INDICATORS

First Indicator - Access method 
       The first indicator position contains a value that defines how
       the rest of the data in the field will be used.  If the
       resource is available by more than one access method, the
       field is repeated with data appropriate to each method.  The
       methods defined are the main TCP/IP (Transmission Control
       Protocol/Internet Protocol) protocols.

       The value in the first indicator position determines which
       subfields are appropriate for use.  For example, when first
       indicator value 1 (FTP) is used, subfields $d (Path), $f
       (Electronic name), $c (Compression information), and $s (File
       size) are appropriate, whereas they would not be with first
       indicator value 2 (Remote login (Telnet)).

       0  - Email
           Value 0 indicates that access to the electronic resource is
           through electronic mail (email).  This access includes
           subscribing to an electronic journal or electronic forum
           through software intended to be used by an email system.

       1 - FTP
           Value 1 indicates that the access to the electronic resource
           is through the File Transfer Protocol (FTP).  Additional
           information in other subfields may enable the user to
           transfer the resource electronically.

       2 - Remote login (Telnet)
           Value 2 indicates that access to the electronic resource is
           through remote login (Telnet).  Additional information in
           subfields of the record may enable the user to connect to
           the resource electronically.

       3 - Dial-up
           Value 3 indicates that access to the electronic resource is
           through a conventional telephone line (dial-up).  Additional
           information in subfields of the record may enable the user
           to connect to the resource.

       7 - Method specified in subfield $2
           Value 7 indicates that access to the electronic resource is
           through a method other than the defined values and for which
           an identifying code is given in subfield $2 (Source of
           access).

       <8 - Other
           Value 8 indicates that access to the electronic resource is
           not specified by one of the other values or by a code in
           subfield $2.>

Second Indicator - Undefined 
       The second indicator position is undefined and contains a
       blank (#).


 SUBFIELD CODES

$a - Host name
       Subfield $a contains the fully qualified domain (host name) of
       the electronic location.  It contains a network address which
       is repeated if there is more than one address for the same
       host.  The convention for a BITNET address is to add .bitnet.

         856     1#$aharvada.harvard.edu$aharvarda.bitnet

------------------------------------------------------------------

$n - Name of location of host in subfield $a
       Subfield $n contains the conventional name of the location of
       the host in subfield $a, including its physical (geographic)
       location.

         856     2#$apucc.princeton.edu$nPrinceton University, Princeton,
                 N.J.


$o - Operating system
       For informational purposes, operating system used by the host
       specified in subfield $a is indicated here.  Conventions for
       the path and filenames may be dependent on the operating
       system of the host.  For the operating system of the resource
       itself (i.e., the item represented by the title recorded in
       field 245), rather than the operating system of the host
       making it available, field 753 (Technical Details Access to
       Computer Files), subfield $c (Operating system) is used.

         856     1#$ars7.loc.gov$d/pub/soviet.archive$fk1famine.bkg
                 $nLibrary of Congress, Washington, D.C.$oUNIX


$p - Port
       Subfield $p contains the portion of the address that
       identifies a process or service in the host.

         856     2#$amadlab.sprl.umich.edu$nUniversity of Michigan
                 Weather Underground$p3000


$q - File <format type> [transfer mode] 
       Subfield $q contains an identification of the file <format
       type> [transfer mode]. <File formats specify the nature of the
       data, how it is used, and includes what is generally known as
       a MIME type.  It may include both the type (e.g. image) and
       the subtype (e.g. jpeg).  The file format type also>
       determines how data are transferred through a network. 
       Usually, a text file can be transferred as character data
       which generally restricts the text to characters in the ASCII
       (American National Standard Code for Information Interchange
       (ANSI X3.4)) character set (i.e., the basic Latin alphabet,
       digits 0-9, a few special characters, and most punctuation
       marks).  Text files with characters outside of the ASCII set,
       or non-textual data (e.g., computer programs, image data) must
       be transferred using another file transfer mode, usually
       binary mode.

         856     13$aarchive.cis.ohiostate.edu 
                 $dpub/comp.sources.Unix/volume 10$fcomobj.lisp.10.Z
                 $q[binary]<application/x-compressed>
         [File is UNIX compressed]

         <856    7#$3NYDA.1993.010.00130
         $uhttp://www.cc.columbia.edu/imaging/photocd/3009-1031-
         1443/IMG0089.512.gif$qimage/gif$2http>


$r - Settings
       Subfield $r contains the settings used for transferring data. 
       Included in settings are: 1) Number Data Bits (the number of
       bits per character); 2) Number Stop Bits (the number of bits
       to signal the end of a byte); and 3) Parity (the parity
       checking technique used).  The syntax of these elements is:

         <Parity>-<Number Data Bits>-<Number Stop Bits>

       If only the parity is given, the other elements of settings
       and their related hyphens are omitted (i.e., "<Parity>").  If
       one of the other two elements is given, the hyphen for the
       missing element is recorded in its proper position (i.e.,
       "<Parity>--<Number Stop Bits>" or "<Parity>-<Number Data
       Bits>-" )


Go to:


Library of Congress
Library of Congress Help Desk (09/02/98)