EAD Tag Library for Version 1.0
Overview of the EAD Structure (6)
High-Level Elements
At the most basic level, encoded archival finding aids consist of two segments: 1) a segment that provides information about the finding aid itself (its title, compiler, compilation date, etc.); and 2) a segment that provides information about a body of archival materials (a collection, a record group, a fonds, or a series). As shown in Figure 1, the EAD DTD splits the first segment into two high-level elements known as EAD Header <eadheader> and Front Matter <frontmatter>. (Elements in Figures 1 through 10 are in alphabetical order, unless the DTD prescribes a sequence for the elements.) The second segment, consisting of information about the archival materials, is contained within the third high-level element named Archival Description <archdesc>. All three of these high-level elements are contained within the outermost element named Encoded Archival Description <ead>, which wraps around the entire document.
EAD Header <eadheader> and Front Matter <frontmatter>
The <eadheader>, outlined in Figure 2, is modeled on the header element in the Text Encoding Initiative (TEI), an international, humanities-based effort to develop a suite of DTDs for encoding literary texts or other objects of study. In an attempt to encourage as much uniformity as possible in the provision of metadata across document types, EAD uses a TEI-like header to capture information about the creation, revision, publication, and distribution of finding aid instances. The resulting <eadheader> consists of four subelements, some of which are further subdivided:
� - EAD Identifier <eadid> provides a unique identification number or code for the finding aid and can indicate the location, source, and type of the identifier.
�
- File Description <filedesc> contains much of the bibliographic information about the finding aid, including the name of the author, title, subtitle, and sponsor (all contained in the Title Statement <titlestmt>), as well as the edition, publisher, series, and related notes encoded separately.
�
- Profile Description <profiledesc> is used to record the language of the finding aid and information about who created the encoded version of the document, and when.
�
- Revision Description <revisiondesc> summarizes any revisions made to the EAD document. (See example under <eadheader> in the EAD Elements section of the tag library.)
The sequence of elements and subelements in the <eadheader> is specified by the DTD, with the expectation that searches across repositories will be more predictable if the elements are uniformly ordered. Such searches may help filter large bodies of machine-readable finding aids by specific categories, such as title, date, repository, language, etc. Required use of the <eadheader> compels archivists to include essential information about their machine-readable finding aids that often went unrecorded in paper form. In addition, elements in the <eadheader> may be used to generate electronic and printed title pages for finding aids.
Because the elements within the <eadheader> must follow a prescribed order to ensure uniformity across finding aids, EAD also includes an optional <frontmatter> element, which can be used to generate a title page that follows local preferences for the sequencing of information. The <titlepage> subelement within <frontmatter> reuses many of the same subelements designated in <filedesc>. The <frontmatter> element can also be used to encode structures such as prefaces, dedications, or other text concerning the creation, publication, or use of the finding aid. (See examples under <frontmatter> and <titlepage> in the EAD Elements section of the tag library.)
Archival Description <archdesc>
As noted above and in Figure 1, the third high-level element in <ead> is Archival Description <archdesc>, which consists of information about a body of archival materials. Within this element is found hierarchically organized information that describes a unit of records or papers along with its component parts or divisions. It includes information about the content, context, and extent of the archival materials as well as optional supplemental information that facilitates their use by researchers. Finding aids generally describe a unit of records or papers at several different, but related, levels of detail. The <archdesc> element encompasses these unfolding, hierarchical levels by first allowing for a descriptive overview of the whole, followed by more detailed views of the parts, designated by the element Description of Subordinate Components <dsc>. Data elements available at the <archdesc> or unit level are repeated at the various component levels within <dsc>, and information is inherited from one hierarchical level to the next. In addition to serving as a wrapper for all the descriptive information about an entire body of archival materials, the <archdesc> element, through a LEVEL attribute, identifies the highest tier of the materials being described.
Imagine a typical scenario: An archivist begins encoding a finding aid by first opening the <ead> element and creating the required <eadheader>. He or she may then add some optional <frontmatter> before opening the <archdesc> element and setting its LEVEL attribute to the value "collection," "record group," "fonds," or "series," depending on which term best reflects the character of the whole unit being described in the finding aid. What then follows are data elements that describe that whole unit, including a special subset of core data elements that are gathered together under a parent element called Descriptive Identification <did>. These <did> subelements are thought to be among the most important for ensuring a good basic description of an archival unit or component. Grouping these elements together serves several purposes. It ensures that the same data elements and structure are available at every level of description within the EAD hierarchy, facilitates the retrieval or other output of a cohesive body of elements for resource discovery and recognition, and, because the elements appear together in the tag library and on software menus and templates, helps to remind encoders to capture descriptive information they may otherwise overlook.
As Figure 3 shows, the <did> element may contain, in any order, one or more of the following descriptive subelements, which are familiar mainstays of archival cataloging:
�
- Container <container> identifying the number of the carton, box, folder, or other holding unit in which the archival materials are arranged and stored;
�
- Origination <origination>, denoting the individuals or organizations responsible for the creation or assembly of the archival materials;
�
- Physical Description <physdesc>, identifying the extent, dimensions, genre, form, and other physical characteristics;
�
- Physical Location <physloc>, identifying the stack number, shelf designation, or other storage location;
�
- Repository <repository>, designating the institution responsible for providing intellectual access;
�
- Date of the Unit <unitdate>, designating the creation dates of the archival materials;
�
- Identification of the Unit <unitid>, containing an accession number, classification number, lot number, or other such unique and permanent identifier; and
�
- Title of the Unit <unittitle>, containing the title of the archival materials at whatever level they are being described, such as collection title, series title, subseries title, file title, or item title.
The <did> element also provides for the use of both an Abstract <abstract> and a general Note <note> element, as well as for Digital Archival Object <dao> and Digital Archival Object Group <daogrp> elements, which may contain digital surrogates of the material being described in the finding aid. Attributes, such as LABEL, TYPE, and ENCODINGANALOG, are associated with the <did> subelements to enable more specific content designation.
The optional ENCODINGANALOG attribute permits the designation of the applicable MARC field or subfield together with the authoritative form of the data. By using ENCODINGANALOG attributes, archivists may be able to generate skeletal, collection-level MARC records automatically from EAD finding aids. Use of the analogs may also help retrieval and indexing systems identify comparable data elements in bibliographic records and finding aids. (See examples under <did> in the EAD Elements section of the tag library.)
Having used the <did> elements to capture a basic description at the <archdesc> level, the archivist may proceed directly to a description of the unit's component parts. More likely, however, the finding aid creator will provide additional narrative information about the content, context, or extent of the whole unit. This description usually appears in prose form inside elements with tag names such as <admininfo>, <bioghist>, <scopecontent>, <organization>, and <arrangement>, which are suggestive of the categories of information typically present in traditional paper-based finding aids. For each of these categories of information, the encoder may use the Heading <head> element to provide a heading based on local preferences, which may or may not correspond to the element name. For example, the DTD permits encoders to identify a biographical note or agency history by any heading they choose (e.g., Biographical Summary, Biography, Jane Doe's Key Dates) as long as the content is correctly tagged as <bioghist>. Structurally, from an SGML perspective, the content models for these narrative-based elements are "heads" and "text," with the latter generally composed of paragraphs <p> or various types of lists <list>, including the specially created EAD <chronlist>, consisting of <chronitem>s that pair a <date> with its corresponding <event> to enable linking and tabular display. By comparison, the information within a <did> subelement is often presented as a short, labeled phrase, or perhaps several subelements are pieced together to form a simple uniform data string.
The <p> element is particularly useful in that it contains many subelements that enable further formatting, linking, and vocabulary control options. Its content model is especially robust, permitting a combination of text and some thirty-three other elements, which are listed in Figure 4 and are identified under <p> in the EAD Elements section of the tag library. These include features generic to most text-based products, such as abbreviations <abbr>, addresses <address>, block quotes <blockquote>, line breaks <lb>, lists <list>, numbers <num>, and tables <table> (see Figure 5 and Figure 10). It also includes the <controlaccess> subelements (see Figure 8), pointers, linking, and reference elements (see Figure 9), and other selected content-based elements. The <p> is widely available throughout <archdesc> and may be opened directly inside more than thirty other elements to enable both paragraph-style formatting of information and access to useful subelements.
Because various intellectual and economic factors will influence an institution's depth of tagging, the DTD allows for the nesting of elements to capture more detailed and specific description as desired. For example, the element called Administrative Information <admininfo>, mentioned above and outlined in Figure 6, contains descriptive background information concerning an institution's acquisition, processing, and management of a body of archival materials. The <admininfo> element designates facts about provenance, acquisition, access and reproduction restrictions, availability of microform and digital surrogates, preferred form of citation, and other descriptive details that help readers of the finding aid know how to approach the archival materials and make use of the information they find. All the specific descriptive details captured in <admininfo> have their own corresponding elements in the DTD--with tag names such as <custodhist>, <accruals>, <acqinfo>, <appraisal>, <accessrestrict>, <userestrict>, <altformavail>, <prefercite>, and <processinfo>--which may be applied individually if desired. Should such specificity not be needed, however, the archivist may elect to tag the entire body of information at the parent level, <admininfo>, and not to encode separately the text relating to each nested subelement. (See examples under <admininfo> in the EAD Elements section of the tag library.)
Description of Subordinate Components <dsc>
Once an archivist has completed the description of the records or papers at the highest (or unit) level, the <dsc> element may be opened, and the focus shifts to describing one or more of the unit's component parts. It is in this section of the encoded finding aid that segments of traditional finding aids such as "series," "container lists," and "calendars" are addressed. The <dsc> is sometimes presented in tabular format (7) and can assume several different forms, which are identified by the element's TYPE attribute. The TYPE attribute can be set to a value of analytic overview ("analyticover"), to identify a series or subseries description; "in-depth," to identify a listing of containers or folders, a calendar, or a listing of items; "combined," to identify instances in which the description of each series is followed immediately by a listing of containers or folders for that series; and "othertype," to identify models that do not follow any of the above-mentioned formats. (See examples under <dsc> in the EAD Elements section of the tag library.)
After the form of the <dsc> has been selected, the archival components are identified, and a LEVEL attribute may be assigned. For example, an archivist who wishes to provide a summary listing of all the series in a collection would open a <dsc>, set the TYPE attribute to "analyticover," open a <c> or <c01> Component tag (components may be numbered <c01> through <c12> to keep better track of the hierarchical levels), set the LEVEL attribute to "series," and proceed to describe the first series-level component by utilizing the same extensive set of elements that were previously available for describing the whole unit at the <archdesc> level. The same procedure would be followed again for all subsequent series-level components, after which point the <dsc> element would be closed. In general, certain <did> subelements, such as <repository> and <origination>, will not be used within a <c> because the information they contain has been encoded at the <archdesc> level and inherited by the <c>. Other <did> subelements, such as <container>, <unitdate>, and <unittitle>, will frequently be used within a <c> to encode new information or more detailed description at a lower hierarchical level.
A second <dsc> might then be opened with a TYPE attribute set to "in-depth" so that a container list can be presented. Each series, subseries, file, or item represented in the container list would be tagged as recursive (8) , nested components, possibly with LEVEL attributes set on the high-level components to identify their hierarchical order within the collection, fonds, or record group. As in the series description, information about each <c> may be identified, if desired, by utilizing the full complement of descriptive elements. This structure of endlessly nested components inside a <dsc>, and further inside <archdesc>, provides for descriptive information that is inherited from one level to another and that shares or repeats the same essential data elements.
An alternative to encoding a <dsc type="analyticover"> followed by a <dsc type="in-depth"> would be to simply use the <dsc type="combined">. The combined model is perhaps a purer manifestation of an unfolding hierarchical description, in that the first Component <c01> (e.g., a series) is only encoded once, followed immediately by a fuller description of its nested parts (e.g., subseries, files, and items). The combined model avoids the potential confusion of machine-processing identical information that has been encoded twice in the same document, a situation that occurs in the two-<dsc> approach. Depending on the sophistication of a system's searching and processing capabilities, the two-<dsc> approach may hamper the ability to show a relationship between the description of the <c01> and the description of its parts. On the other hand, using the two-<dsc> approach not only readily accommodates a legacy data structure found in many existing finding aids, it also replicates the functionality which that structure provided. For example, many archivists have found it extremely helpful to assemble in one spot all the first-level component descriptions to provide researchers with a quick overview of the archival unit's content and organization and to permit ready comparisons between components. Flipping through a long paper guide or scrolling and jumping through an electronic finding aid to locate all the first-level summaries is a drawback of the combined <dsc>, which an online delivery system would need to address.
Adjunct Descriptive Data <add> and Other Descriptive Data <odd>
Two other notable elements within <archdesc> and <c> are Adjunct Descriptive Data <add> and Other Descriptive Data <odd>. The <add> element (see Figure 7) is designed to encode supplemental descriptive information that facilitates use of the materials featured in the finding aid. This includes additional access tools to the materials, such as indexes, file plans, and other finding aids, as well as descriptions or lists of materials separated from or related to those described in the finding aid. Archivists may elect to tag all the adjunct information simply as an <add>, or they may open the <add> element and encode each piece of information with its specific corresponding tag, such as <bibliography>, <fileplan>, <index>, <otherfindaid>, <relatedmaterial>, and <separatedmaterial>, as listed in Figure 7. As a subelement of both <archdesc> and <c>, <add> may appear throughout a finding aid in whatever information sequence best suits the repository's needs. For many encoders, however, the best sequence will likely be to group all the <add> elements together near the end of the finding aid. (See examples under <add> in the EAD Elements section of the tag library.)
Although EAD was designed for older, paper-based finding aids as well as newer machine-readable ones, it could not be expected to accommodate all existing practices. When converting existing finding aids to an ideal EAD markup, some shifting of text or addition of data may be necessary to conform to the DTD's sequencing of elements and the consignment of certain elements to specific settings. A special element called Other Descriptive Data <odd> helps to minimize conversion difficulties by designating as "other" any information that may not fit into EAD's otherwise distinct categories.
Controlled Access Headings <controlaccess>
Aside from encoding the major structural parts of a finding aid and designating the core descriptive data about the unit and its components, users of EAD also have the option of identifying character strings throughout the finding aid that are likely to be the objects of searches, such as personal, corporate, family, and geographic names; occupations; functions; form and genre terms; subjects; and titles. All of these elements (<name>, <persname>, <corpname>, <famname>, <geogname>, <occupation>, <function>, <genreform>, <subject>, and <title>) permit, through the use of attributes, the designation of MARC and ISAD(G) encoding analogs and authorized forms. Additional optional attributes allow for specifying the role or relationship of persons and corporate bodies (e.g., author, editor, photographer) and the source of the controlled vocabulary terms used (e.g., Library of Congress Subject Headings, Library of Congress Name Authority File, Art and Architecture Thesaurus, Dictionary of Occupational Titles). Although the DTD permits liberal access to these elements throughout the finding aid, especially within the <p> and <unittitle> elements, special mention should be made of the ability to bundle them together under the parent element Controlled Access Headings <controlaccess> as shown in Figure 8. (See examples under <controlaccess> in the EAD Elements section of the tag library.)
The <controlaccess> element was created specifically to enable authority-controlled searching across finding aids on a computer network. Users are likely to approach online finding aids via a variety of avenues. Some may search a repository's online catalog, locate relevant entries, and follow links from those entries to online versions of finding aids. Others may start by searching the finding aids directly, bypassing the catalog and losing the advantage of the authority-controlled search terms contained therein. The <controlaccess> element is designed to replicate in a finding aid the collection-level search terms found in the 1xx, 6xx and 7xx fields of MARC catalog records. Finding aid searches limited to the <controlaccess> element will improve the likelihood of locating strong sources of information on a desired subject, because access terms will have been entered in a consistent and authorized form across finding aids, and also because only the most significant terms are likely to have been selected for encoding.
Pointer, Reference, and Linking Elements
As shown in Figure 9, pointer and linking references are available for simple (one-way) links utilizing Archival Reference <archref>, Bibliographic Reference <bibref>, Digital Archival Object <dao>, Extended Pointer <extptr>, Extended Reference <extref>, Pointer <ptr>, and Reference <ref>. When multidirectional (extended) links are required, Digital Archival Object Location <daoloc>, Extended Pointer Location <extptrloc>, Extended Reference Location <extrefloc>, Pointer Location <ptrloc>, and Reference Location <refloc> are used. These location elements are wrapped together either in a Digital Archival Object Group <daogrp>, Pointer Group <ptrgrp>, or a Linking Group <linkgrp>. The linking elements are available in <p> and elsewhere in the DTD to enable EAD to support hypertext and hypermedia. This paves the way for finding aids to become more dynamic in an online environment and facilitates the capability to link electronic finding aids to digital representations of the archival materials described therein.
Table Elements
In addition to the columnar displays discussed earlier under Description of Subordinate Components <dsc>, finding aids often include other kinds of text that is presented in tabular format, e.g., multicolumn chronology lists in biographical notes or highly structured file plans and other Adjunct Descriptive Data <add>. In many instances, it may be possible to reproduce this tabular layout visually on computer monitors or in print through the use of style sheets. However, in order to achieve the desired visual presentation, it sometimes may be necessary to embed a table structure within the EAD markup of certain complex documents. The EAD DTD makes this possible by defining a superstructure of columns, rows, and cells containing data in the manner of a spreadsheet. The Table <table> element (see Figure 10) contains one or more Table Group <tgroup> elements, the exact number of which depends on the number of times the column specification changes within the table. Each <tgroup> element bundles five closely interrelated subelements that work together to form all or part of the table: The Table Column Specification <colspec> and the Spanned Column Specification <spanspec> define the columnar layout of the table; the Table Head <thead> and Table Foot <tfoot> elements provide headers and footers for the table; and the Table Body <tbody> element wraps the text or information within the table. The <tbody> element contains Row <row> elements, which define the contents of each row in the table, and the <row> elements, in turn, contain Table Entry <entry> elements, which encapsulate the contents of each cell in the table. (A cell is the intersection of a row and a column.) It is important to recognize that the table elements are used primarily for formatting purposes. The <row> and <entry> markup does not replace the content encoding of the finding aid but is overlaid on top of it, in the same way that the <drow> and <dentry> tabular markup described in footnote 7 is overlaid on top of the content markup of the <dsc> elements.
- 6. This overview is adapted from Janice E. Ruth's article, "Encoded Archival Description: A Structural Overview," American Archivist 60 (Summer 1997).
- 7. Columnar displays of the <dsc> elements may be achieved in two ways. SGML stylesheets may be used to manipulate intellectual content elements, such as <c> and <did>, for basic columnar output. For more precise columnar layouts, including greater control of indentations, encoders may overlay the intellectual markup with a special set of tabular display elements created specifically for EAD: Display Entry <dentry>, Display Row <drow>, and Table Specification <tspec>. These elements replace the use of <did>. Encoders who wish to use these display elements must first make a minor modification to their copies of the DTD in order to access the elements. See <dentry>, <drow>, and <tspec> in the EAD Elements section of the tag library for additional information.
- 8. The term "recursive" is used to refer to a procedure which may repeat itself indefinitely. In SGML usage this indicates an element which may contain or be nested within itself.
Go to:
Copyright Society of American Archivists, 1998.
All Rights Reserved.
The Library of Congress
Library of Congress Help Desk
(10/12/99)