PROPOSAL NO: 96-7

DATE: May 6, 1996
REVISED:

NAME: Changes to FTP File Label Specifications for Electronic Files of USMARC Records

SOURCE: Library of Congress

SUMMARY: This paper proposes changes, originally proposed by the participants in the European CoBRA FLEX Project 10164 for the file label that is used for files of USMARC records transferred via the File Transfer Protocol (FTP). Additional fields are proposed that have been deemed necessary for exchange of records in a variety of MARC formats.

KEYWORDS: FTP Label; File Transfer

RELATED: DP61 (Jan. 1993); 93-9 (June 1993); DP94 (Jan. 1996)

STATUS/COMMENTS:

5/6/96 - Forwarded to USMARC Advisory Group for discussion at the July 1996 MARBI meetings.

7/6/96 - Accepted with the following change: Proposal 2. The end-of-field marker may be either carriage return (X'0D') or carriage return followed by line feed (X'0D''0A'). Do not use number sign or the current X'1E'.

8/6/96 - Result of final LC reveiew - Agreed with MARBI decision.


PROPOSAL NO. 96-7:  Changes to FTP File Label Specifications for Electronic Files

1.     INTRODUCTION

The European library community has been investigating the use of
the Internet File Transfer Protocol (FTP) for the electronic
exchange of bibliographic data.  The European Commission's
Libraries Programme through CoBRA (Computerized Bibliographic
Record Actions) has funded the FLEX (File Label EXchange) Project
10164 to investigate the need for standards in this area, and "to
suggest a suitable file labelling and naming format".

The participants in the FLEX Project understand that without
standardization in the way files are described within the label
file, it would become increasingly difficult to exchange
bibliographic information internationally.  Because the USMARC
specification for electronic file transfer has been widely reviewed
by the USMARC community and is now in use by many exchange partners
of bibliographic records, the FLEX project participants have
proposed that the USMARC specification be used as the base
specification.  However, they have proposed some enhancements to
that specification to take into account a European dimension for
exchanging and processing bibliographic data.


In addition, the FLEX Project participants have suggested a file
naming convention for use when certain operating system constraints
apply.


2.     PROPOSED CHANGES

See Attachment A for an FTP File label example.  See Attachment B
for revised definitions of the fields.

Proposal 1.  Change label file character set

It is proposed that the character set of the label file conform to
ISO 646-IRV or ASCII.  (There are two differences between ISO 646-
IRV and ASCII:  1) ISO 646 character position "24" is the universal
currency symbol whereas this character is the "$" symbol in ASCII;
2) ISO 646-IRV character position "7E" is an overline or tilde
whereas this character is the tilde in ASCII.  These differences
should not be problematic.)


Proposal 2.  Change the end-of-field character symbol from the
current end-of-field marker (X'1E') to the number sign "#" (X'23'),
followed by a carriage return (X'0D') or carriage return/line feed
(X'0D''0A') depending on operating systems used.

There was objection to using the USMARC end-of-field character
(X'1E') in what was felt should be a text file.  It is, therefore,
proposed that the same end-of-field characters that are currently
used in the diskette FTP file label specification be used in this
file label specification.  These characters can be supplied by any
operating system.


Proposal 3.  Add optional field CID (Country Identifier)

Field ORS (Originating System ID) is, in some cases, insufficient
to identify the originating system.  When necessary, the CID
(Country Identifier) field would be used with the ORS field but its
use would remain optional.  The country identifier would be the
two-character alpha code defined by ISO 3166 (Codes for the
Representation of Names of Countries).


Proposal 4.  Make the FOR Field (Format) mandatory

It is proposed that the existing FOR field (Format) be made
mandatory to identify the structural format standard used for
records in the file.  For example, "M" = Z39.2 (or its equivalent
ISO 2709), and "S" = SGML (ISO 8879).


Proposal 5.  Add optional field FQF (Format Qualifier)

Field FOR (Format) is insufficient in itself to completely describe
the format of the record file, (e.g., for identifying a particular
tag set/specification for Z39.2 records or a particular DTD for
SGML records.  The FQF (Format Qualifier) field would be used in
conjunction with the FOR (Format) field but its use would remain
optional.  It is proposed that the FQF field follow immediately
after the FOR field in field sequence.  The content of the FQF
field would be taken from a list of formats (e.g., similar to the
list of MARC format types in the Z39.50 Registered Record
Syntaxes** and DTDs.  For SGML files the DTD is indicated by the
highest level tag in the document instance (or in the tag DOCTYPE
in the DTD itself).
**(http://www.loc.gov/z3950/agency/objects/syntax.html)

       Examples:      FQF  USMARC
                      FQF  BOOK SYSTEM "iso12083-book.dtd"  (DTD
                      specified in ISO 12083)


Proposal 6.  Add optional fields CS<0-n> (Character Set<0-n>)

To assist specifying character sets and character set variations,
it is proposed that two sets of fields be added. The first are
CS<0-n> (Character Set <0-n>) which specify the character sets
found in the file.  CS0 would specify the initial character set
needed for processing the records in the file.  This indicates, at
least, the G0 set needed.  It may indicate an 8-bit set in which
case it is more than the G0 set.  For USMARC, it can be specified
as either ASCII (the G0 part of the USMARC character set) or as
USMARC.

CS1 indicates an additional set needed in the file; CS2 indicates
another character set used in the file; etc.  The content of each
CS<0-n> would equate to a particular international standard
character set identifier (e.g., extended Latin ISO 5426 - 1983), an
ISO registration number (e.g., Registry #37), text (e.g., USMARC),
or a reference to a private character set.  If the field content
represents a private character set then the reader should be
pointed to the NOT field (Notes) for further information on
processing requirements or the REP (Reply To) for a person to
contact.  An occurrence could specify an additional control set
such as ISO 6630.

The use of the CS0 fields is redundant for USMARC records.  Once
the USMARC format is defined (in FOR and FQF), the initial
character set is implied.  In the USMARC context, the use of the
CS1-n fields are also redundant as character sets are specified in
each record.  (The absence of an 066 implies USMARC Roman is used.) 
The USMARC 066 field of a record identifies (implicitly or
explicitly) all character sets used in the record.  This may be
different for other MARC formats, however.  Likewise an SGML DTD
indicates the character sets internally in the CHARSET tag
(although it is not carried in a document instance that does not
have the DTD attached).

       Example:       CS0    USMARC Roman


Proposal 7.  Add optional fields CV<0-n> (Character Variation<0-n>)

It is proposed that an additional field be used to provide
information on variations to the character sets specified in 
CS<0-n>, if the sets noted 1) are not used strictly according to
the standard, 2) have options for some positions that need to be
specified, or 3) have additional characters in positions that are
undefined in the standard.

       Example:       CS0  ISO 646-Basic
                      CV0  2/3=number sign; 7/14=umlaut
                      CS1  ISO 5426
                      CV1  4/9 not used


Proposal 8.  Add optional field FDI (Final Destination
Identification)

The FDI field in intended to assist those organizations that
exchange bibliographic information with a large recipient community
in identifying the intended customer.  The field would be used to
contain the name or identifier of the final-destination database. 
An example requiring this method of identification would be a PUT
transfer to a central customer point, and additional information is
required by this central point to determine the final destination
for the records.

It is proposed that this field follow the ISS field (Issue).


                                         ATTACHMENT A

FTP LABEL EXAMPLE


  DAT##19951221211236.0#
  RBF##1564#
  DSN##LOC.BOOKS.DIST.DATA.D951221#
  ORS##DLC#
  CID##US#
  DTS##19951222013000.0#
  DTR##1995122119951221#
  FOR##M#
  FQF##USMARC#
  DES##MUMS Books Daily DQ#
  CS0##USMARC#
  CS1##USMARC Hebrew#
  VOL##V21#
  ISS##I50#
  FDI##Hebraic Resource File--RS10#
  REP##NDMSO@LOC.GOV#
  NOT##Test set of Hebrew records#


"#" at end-of-field in above example is not a
 space, but is a graphic character ("#")



                                         ATTACHMENT B

PROPOSED CHANGES TO THE FTP FILE LABEL
       Below is a summary of the enhanced file label
       specification with changes indicated.  [] indicates text
       to be deleted; <> indicates text to be added.

Tag    Element Name                 Description                  M/O    F/V    R/NR

DAT    Date Compiled                YYYYMMDDHHMMSS.F              M      F      NR
RBF    Number of Records            Numeric                       M      V      NR
DSN    Data Set Name                Alphanumeric                  M      V      NR
ORS    Origin. System ID            Alphanumeric                  M      V      NR
<CID   Country ID                   Alphanumeric                  O      F      NR>
DTS    Date Sent                    YYYYMMDDHHMMSS.F              O      F      NR
DTR    Dates of Records             YYYYMMDDYYYYMMDD              O      F      NR
FOR    Format                       Alphanumeric            [O] <M>      F      NR
<FQF   Format Qualifier             Alphanumeric                  O      V      NR>
DES    Description                  Alphanumeric                  O      V      R
<CS0-n Character Set 0-n            Alphanumeric                  O      V      NR>
<CV0-n Char. Var. 0-n               Alphanumeric                  O      V      NR>
VOL    Volume                       Alphanumeric                  O      V      R
ISS    Issue                        Alphanumeric                  O      V      R
<FDI   Final Dest. ID               Alphanumeric                  O      V      NR>
REP    Reply to                     Alphanumeric                  O      V      R
NOT    Note                         Alphanumeric                  O      V      R


DAT    (Date compiled):  Mandatory; Fixed length; Not
       repeatable.  This is the date the originating system
       completed the compilation of the file of records.  This
       is not the date of the creation of the records contained
       in the bibliographic file.  The field is recorded
       according to Representation for Calendar Date and Ordinal
       Date for Information Interchange (ANSI X3.30) and
       Representations of Local Time of the Day for Information
       Interchange (ANSI X3.43).  The date requires 8 numeric
       characters in the pattern yyyymmdd (4 for the year, 2 for
       the month, and 2 for the day; right justified and zero
       filled).  The time requires 8 numeric characters in the
       pattern hhmmss.f (2 for the hour, 2 for the minute, 2 for
       the second, and 2 for a decimal fraction of the second,
       including the decimal point).  The 24-hour clock is used.

RBF    (Number of records in file):  Mandatory; Variable length;
       Non-repeatable.  This element includes the number of
       logical records contained in the file of USMARC records.

DSN    (Data Set Name):  Mandatory; Variable length; Not
       repeatable.  The filename of the file of USMARC records
       (which is sent separately) for which this is a file
       label.

ORS    (Originating system ID):  Mandatory; Variable length; Not
       repeatable. The name of the system that compiled the
       files of records.  This could be a symbol (e.g., OCLC or
       NUC) or text.

<CID   (Country ID):  Optional; Fixed length; Not repeatable. 
       The country identifier of the system that compiled the
       files of records.  The identifier would be taken from
       Codes for Representation of Names of Countries (ISO
       3166).>

DTS    (Date sent):  Optional; Fixed length; Not repeatable. 
       This is the date of transmission of the file of USMARC
       records.  The field is recorded according to
       _Representation for Calendar Date and Ordinal Date for
       Information Interchange_ (ANSI X3.30) and Representations
       of Local Time of the Day for Information Interchange
       (ANSI X3.43).  The date requires 8 numeric characters in
       the pattern yyyymmdd (4 for the year, 2 for the month,
       and 2 for the day; right justified and zero filled).  The
       time requires 8 numeric characters in the pattern
       hhmmss.f (2 for the hour, 2 for the minute, 2 for the
       second, and 2 for a decimal fraction of the second,
       including the decimal point).  The 24-hour clock is used.

DTR    (Dates of records):  Optional; Fixed length; Not
       repeatable.  This includes inclusive dates of last
       transaction of the records in the file, i.e. the first
       and last date recorded in the 005 fields of the file of
       records.  The field is recorded according to
       _Representation for Calendar Date and Ordinal Date for
       Information Interchange_ (ANSI X3.30).  The date requires
       16 numeric characters in the pattern yyyymmddyyymmdd (4
       for the year, 2 for the month, and 2 for the day for each
       date; right justified and zero filled).

FOR    (Format):  <Mandatory>; Fixed length; Not repeatable. 
       This element designates the format of the records,
       generally M for <Z39.2 or ISO 2709> (MARC) <, S for ISO
       8867 (SGML)>.

<FQF   (Format qualifier):  Optional; Variable length; Not
       repeatable.  This element provides additional description
       of the format of the record file.  For example, it may
       identify a particular tag set/specification for MARC
       records or a particular DTD for SGML records.  For MARC
       formats, the content of the FQF field may be text or a
       code from the list:  Z39.50 Registered Record Syntaxes. 
       For DTDs, the content is the identifier in the DTD
       DOCTYPE field.>

DES    (Description of records):  Optional; Variable length;
       Repeatable.  This element describes the records.  The
       data could be coded or describe a product name.  (For
       example, OCLC uses B for Bibliographic describing a data
       type; CDS may use a product name, such as MDS-Books All.)

<CS0-n (Character set <0-n>):  Optional; Variable length; Not
       repeatable.  These fields specify the character sets (control
       and/or graphic) needed for processing the record data file. 
       The field content is text indicating a particular set (e.g.,
       ISO 646-IRV, ISO Registry #37, USMARC, or a reference to a
       private character set).  CS0 indicates at least the G0 set and
       CS<1-n> indicate other sets in the file.>

       <CV0-n (Character variation <0-n>):  Optional; Variable length;
       Repeatable.  These fields are used in conjunction with the CS
       fields and contain a textual description of the variations
       from the set specified in the corresponding CS field. 
       Variations may be because the set noted 1) is not used
       strictly according to the standard, 2) has options for some
       positions that need to be specified, or 3) has additional
       characters in positions that are undefined in the standard.> 


VOL    (Volume):  Optional; Variable length; Repeatable.  This
       may be used if it is desirable to assign a volume number
       when distribution of records is by subscription.  Each
       file within a subscription year may be given a volume and
       issue number.

ISS    (Issue):  Optional; Variable length; Repeatable.  This
       may be used if it is desirable to assign a volume and
       issue number when distribution of records is by
       subscription.  Each file within a subscription year may
       be given a volume and issue number.  It may be combined
       with Volume (e.g., V1402).

<FDI   (Final destination ID):  Optional; Variable length; Not
       repeatable.  This field would contain the name or
       identifier of the final-destination database.>

REP    (Reply to):  Optional; Variable length; Repeatable.  This
       field contains an address given as a contact for
       problems/questions in transmission.  It may include an
       Internet or postal address.

NOT    (Note):  Optional; Variable length; Repeatable.  This
       field contains textual information or messages about the
       file.


Go to:


Library of Congress
Library of Congress Help Desk (09/02/98)