EAD Application Guidelines for Version 1.0

Chapter 1. Setting EAD in Context: Archival Description and SGML

1.1. Introduction
1.2. The Evolution of Archival Descriptive Standards
1.3. The Evolution of Archival Information on the Internet
1.4. Why SGML?
1.5. What is XML?
1.6. The Relationship between MARC and EAD
1.7. Other Resources for Learning about EAD: 1.7.1. Readings; 1.7.2. Web Sites; 1.7.3. Training Opportunities

1.1. Introduction

Encoded Archival Description (EAD) is a data structure standard for preserving the hierarchy and designating the content of descriptive guides to archival holdings worldwide. It enables Internet delivery of these guides and also ensures their permanence by providing a stable, non-proprietary data storage environment from which data can be transferred to other software environments as necessary. In technical terms, EAD comprises a Document Type Definition (DTD) for encoding archival finding aids that is written following the syntactic rules of Standard Generalized Markup Language (SGML) and Extensible Markup Language (XML).

The EAD Tag Library⁽⁸⁾ identifies the names and definitions of all EAD data elements defined in the DTD. These EAD Application Guidelines provide interpretative guidance to enable archivists to apply the DTD accurately and effectively when encoding their repositories' finding aids. These three documents (the DTD, the Tag Library, and the Application Guidelines) together comprise complete documentation for EAD Version 1.0.

This first chapter places EAD within the broader context of other archival descriptive standards and explains the choice of SGML as its technical environment. It emphasizes that while EAD's development began in the United States and its structure is rooted in this country's descriptive practices, EAD's developers incorporated significant concepts from the international descriptive framework provided by the General International Standard Archival Description (ISAD(G))⁽⁹⁾ and from national descriptive content standards such as the Canadian Rules for Archival Description (RAD).⁽¹⁰⁾ In addition, EAD elements were assigned language-neutral nomenclature designed to circumvent terminological differences and thereby encourage international application and acceptance of the DTD.

The EAD development process has been thoroughly documented elsewhere,⁽¹¹⁾ but one point is important to emphasize in the context of these Guidelines, namely that both the philosophical underpinnings and the structural particulars of EAD are firmly rooted in archival principles, tradition, and theory. The EAD development group analyzed archival finding aids as documents, as well as the descriptive principles embodied in the aforementioned ISAD(G) framework, RAD, and Archives, Personal Papers, and Manuscripts (APPM);⁽¹²⁾ from this the group developed and articulated a set of design principles. These principles provided a conceptual foundation intended to ensure that EAD would remain grounded in the realities of past and current theory and practice.⁽¹³⁾

One key design principle states that EAD will accommodate both the creation of new finding aids and the conversion of existing (or legacy) data. EAD is indeed sufficiently flexible to achieve this, but at the same time, it seeks to foster structural uniformity across finding aids in the belief that adherence to a consistent data model increases successful document interchange among repositories and that greater standardization of finding aids would generally be a positive development. Another important design principle specifies that while a large and diverse universe of archival descriptive data exists, EAD is intended to accommodate data that supports description, control, navigation, indexing, and online or print presentation, but not necessarily data that is only intended to address local collection management needs.

1.2. The Evolution of Archival Descriptive Standards⁽¹⁴⁾

Archives, libraries, museums, and other cultural institutions exist to preserve and protect the documentary record of human activity and to make it available for research, study, and evidentiary purposes. To carry out this mission, archival repositories have long devoted significant effort to arrangement and description of their holdings, routinely preparing detailed guides to collections so that users can locate materials relevant to their interests. Until recently, however, many of these finding aids were unpublished and therefore available only within a single repository. Archivists have long sought affordable and effective means of making their resources more widely known.

In the United States, for example, some repositories have prepared published summaries of their holdings, and during the economic depression of the 1930s, the government funded a major historical records survey as a work relief project. In the late 1950s, more systematic efforts were initiated to assemble summary descriptions of resources nationwide. The commencement in 1959 of the multivolume National Union Catalog of Manuscript Collections,⁽¹⁵⁾ followed by the 1961 publication of Hamer's path-breaking Guide to Archives and Manuscripts in the United States,⁽¹⁶⁾ helped identify the location and general scope of the manuscript collections within participating repositories.⁽¹⁷⁾

Helpful as these paper-based projects were, however, it was not until the advent of the MARC AMC format in the late 1970s that repositories in the United States gained the ability to disseminate information about their holdings more widely via national bibliographic systems. MARC AMC provided ⁽¹⁸⁾ to the resulting catalog records enabled archival holdings to be searched with the same flexibility and precision as published materials. The advances of MARC AMC notwithstanding, the MARC records could accommodate only summary information about holdings; they could not absorb all the data in a detailed finding aid. They could, however, point to the existence of detailed paper-based finding aids. Nevertheless, it remained problematic that these detailed finding aids were not yet part of a shared online environment. This was especially frustrating because many finding aids gradually were being produced using word processing or database systems.

Although numerous American repositories embraced the MARC AMC format, most European institutions did not, working instead toward development of ISAD(G), which was adopted by the International Council on Archives (ICA) in 1993. ISAD(G) defines twenty-six elements "that may be combined to constitute the description of an archival entity" at any level.⁽¹⁹⁾ ISAD(G) also provides a set of definitions for archival terminology and formulates four general principles to guide archivists in multilevel description. The primary motivating factor behind the development of this standard was the recognition that some level of descriptive consistency would be required in order to facilitate the exchange and retrieval of archival information in unified, multirepository or multinational information systems.

EAD is a more specific structural standard than ISAD(G) in that EAD is focused on the particular type of archival finding aid typically called an inventory or register. As mentioned earlier, however, the developers of EAD looked closely at ISAD(G) and made certain that its elements were accommodated within the EAD data structure.⁽²⁰⁾Moreover, EAD is completely compatible with ISAD(G)'s principles of multilevel description. This comfortable fit between the ISAD(G) and EAD data structures is a primary reason why interest in EAD has been strongly international in scope.

Archivists who lack experience thinking explicitly about multilevel description when implementing a hierarchical data structure such as EAD will find that ISAD(G) provides a vital framework within which to situate EAD-related decision making.

1.3. The Evolution of Archival Information on the Internet⁽²¹⁾

The emergence of EAD, in concert with the growth of the Internet, now enables repositories worldwide to disseminate more easily information about their holdings. Systems can be constructed to enable researchers to search across all the collections of a single repository (and in union systems, of multiple repositories) in order to identify and locate resources on any topic of interest. In addition, users in some environments can now navigate via hyperlinks from broad-based subject or name searches of MARC records, to EAD finding aids, to digital representations of archival materials themselves.

This new environment provides us with an opportunity to reconceptualize how we deliver information to our users, both traditional archival users and entirely new potential audiences. Archivists were quick to recognize that the Internet provided opportunities for electronic dissemination of finding aids, and many rapidly established Gopher sites for that purpose. The results of these experiments were tantalizing but ultimately discouraging. Gopher software could manage finding aids only as simple text files lacking structural or typographical formatting and important features such as footnotes; this made lengthy finding aids difficult to navigate. Moreover, no mechanism existed to link the finding aids to any corresponding MARC records. A user searching a repository's online catalog therefore had to exit the catalog and log into the Gopher site to verify whether a finding aid existed (for those still using Gophers after Web-based online catalogs became available, this particular problem was eliminated).

The emergence of the World Wide Web in the early 1990s offered significant advantages over Gophers. Hyper Text Markup Language (HTML), the SGML DTD in which Web-based documents are currently encoded, furnished the mechanism to display finding aids with additional typographical nuances and navigational techniques. Moreover, the essence of the Web-the ability to create dynamic hyperlinks among documents stored at different locations-made it possible, particularly with the appearance of Web-based online catalogs, to link a MARC record to its corresponding finding aid.

It soon became clear, however, that HTML also has significant limitations. The principal problem lies in the fact that HTML is designed to provide only procedural encoding to facilitate improved layout and appearance; the intellectual structure or content of documents cannot be meaningfully encoded. For example, HTML easily represents features such as differing point sizes for headings or italics for formal titles, but it cannot distinguish a scope and content note from a biographical summary, a personal name from a geographic name, or a title from a date. Thus, HTML is unable to represent visually or permanently store the complex content and structure of archival finding aids. This means that HTML cannot enable sophisticated searching or navigation, nor ensure data permanence and facilitate future data migration. Moreover, although the basic rules and structure of HTML are relatively stable, its development environment is quite volatile and idiosyncratic, lacking the rigor of standards that is essential to successful information exchange and data migration.

1.4. Why SGML?

Working as director of the Berkeley Finding Aid Project, the precursor to EAD, Daniel Pitti determined that SGML offered a promising framework for overcoming the flaws of Gopher and HTML for delivering archival finding aids via the Internet. Not only does SGML enable full structural and content encoding, but in its inherently hierarchical approach to data structure it mirrors the information hierarchies that have long been a fundamental characteristic of archival description. Moreover, through the earlier implementation of standards such as MARC AMC and the various national content standards mentioned earlier, the archival community had learned the value of using community-based open standards. Thus, SGML was compelling because it is a standard (ISO 8879), it is open (in the sense of being independent of any particular community or proprietary software application), and it is possible to design an SGML application specifically focused on the characteristics of archival finding aids, rather than having to use a more generalized scheme designed for some other type of document.

One example of a more generalized scheme is the Text Encoding Initiative (TEI), an international cooperative effort to develop an SGML DTD for scholarly texts.⁽²²⁾ Pitti looked closely at TEI because it was an important humanities-based computing initiative, but he ultimately found its goals incompatible with the needs of finding aids. This was because TEI was designed to encode literary and other texts as objects of study, and such documents are very different from the type of descriptive metadata that archival finding aids represent. As a result there are many elements in TEI that are not needed in EAD. More significantly, key elements required for finding aids are not available in TEI. EAD was, however, made as consistent with TEI as possible: the basic TEI header structure was incorporated into EAD,⁽²³⁾ and element names and attributes conflict as little as possible. Moreover, there has been active communication between the EAD and TEI developers in order to ensure that EAD remains a compatible part of the larger universe of humanities-based computing initiatives.

As noted above, SGML is inherently hierarchical. EAD reflects the ability of a well-crafted SGML DTD to identify the constituent intellectual and physical parts of a predominantly text-based document as distinct fields or elements, and then to nest component parts, or subelements, within them. This nesting capability allows the encoder of a finding aid (and subsequently a researcher using the encoded finding aid online) to work first with high-level elements that reflect an overview of the finding aid, and then to unfold progressively more detailed sections. Conversely, certain browser software can enable a user to search an EAD finding aid directly at item- or folder-level, then to broaden or contextualize the search by examining other items contained at the same level, or to move further up in the hierarchy to such elements as a scope and content note for a particular series or for an entire collection.

Employing the principle of inheritance, SGML enables elements at a lower level in a hierarchy to inherit the information encoded in higher-level elements; this complies with the ISAD(G) rule regarding the nonrepetition of information.⁽²⁴⁾ This means that an encoder need not repeat descriptive data that already was entered at a higher level within the finding aid. Inheritance is illustrated in chapter 3, particularly in the figures in section 3.5.2.5.

1.5. What is XML?

In 1996 the World Wide Web Consortium founded the XML Working Group to write a set of specifications to enable use of SGML DTDs other than HTML on the Web.⁽²⁵⁾ This need was rooted in HTML's inability to support intellectual encoding of data.

In order to be Web-deliverable, XML simplifies some of SGML's complexities; EAD included few of these complexities and so was easily made XML-compliant. The full implications of XML with respect to EAD implementation are covered in section 4.3.2 and in chapter 6.

XML was adopted by the World Wide Web Consortium as a Web standard in 1998. Version 5.0 of Microsoft's Internet Explorer browser supports XML documents, and as of early 1999, Netscape had incorporated XML into the beta versions of its next browser release.

1.6. The Relationship between MARC and EAD

As mentioned in section 1.2, MARC records for archival materials are summaries of the more detailed information usually found in finding aids; this abridgement is necessary because a MARC record has a length limit that generally accommodates only a collection-level description. Many archivists have therefore questioned whether MARC cataloging has become redundant or unnecessary now that EAD exists. While this is a logical question, it is important to note that the inclusion of archival catalog records in integrated online catalogs enables many users to locate archival resources more easily than they might otherwise. Until cross-domain resource discovery is more developed than at present, the value of maintaining archival information (however summary it might be) in these integrated systems in order to bring primary sources to the attention of library catalog users cannot be overstated.

Some questions surrounding the coexistence of MARC and EAD derive from two aspects of MARC implementation that have concerned some archivists: first, the fact that a MARC record is just a summary, not the complete finding aid; and second, that the preparation of a MARC record adds one more resource-intensive step to the arrangement and description of archival materials. EAD seeks to address both of these concerns by identifying the relationships between MARC data elements and their corollaries within encoded finding aids. This is achieved by specifying encoding analogs for EAD elements that correspond directly to specific MARC fields (see section 3.5.3.1 for details).

The use of encoding analogs provides the potential for repositories to consolidate EAD encoding and MARC cataloging into a single activity by generating a basic MARC record automatically from EAD; the opposite also can be accomplished by importing a MARC record into an EAD finding aid in order to add collection-level descriptive information and controlled access points to an existing container listing. Either activity would be accomplished by means of a programming script (see section 4.3.4 for more information). A MARC record exported from an EAD finding aid potentially could be uploaded into a larger MARC system, such as RLIN, OCLC, or a local online catalog. Repositories following this course would still retain the option of further editing the resulting MARC records using whatever MARC-based editing software they normally utilize. Automated routines have not yet been developed for these processes, but repositories wishing to explore these options are advised to consult the MARC-to-EAD crosswalk found in appendix B to identify concordances between data elements.

While EAD provides a much more flexible and detailed data structure for archival description than does MARC, EAD is a data structure standard, not a data content standard, and therefore does not mandate authoritative forms of content for any of its elements. This is potentially a significant drawback for information exchange. Standardization of the content of EAD descriptive elements can be achieved, however, if repositories or consortia develop and adhere to specific data content conventions, or "best practices." The content of EAD elements that have encoding analog attributes can be chosen based on a data content standard such as RAD or APPM, or a data value standard such as the Library of Congress Name Authority File (LCNAF) or Library of Congress Subject Headings (LCSH).

1.7. Other Resources for Learning About EAD

In addition to the official EAD documentation comprising the Tag Library and these Guidelines, other resources are available to assist those interested in learning more about EAD.

1.7.1. Readings

The published literature on EAD is growing gradually, and special issues of several library and museum journals were in the planning stages as of early 1999. The first significant body of articles about EAD was published in the summer and fall 1997 issues of the American Archivist (vol. 60, nos. 3-4), which were special thematic issues devoted entirely to EAD.⁽²⁶⁾

The summer issue (Context and Theory) contains six articles written by members of the EAD development team that provide background iformation on these topics: aspects of the history of archival description and of information systems that establish the context within which EAD was developed; the nature of structured information in general and of EAD's structure in particular; administrative and technical issues that must be considered prior to implementing EAD; and EAD's significance as an emerging standard for archival description.⁽²⁷⁾

The fall issue (Case Studies) contains six case studies written by EAD "early implementers," which is to say archivists at institutions that implemented EAD while it was still under development, prior to publication of the Version 1.0 DTD in August 1998. The first case study describes the process of "reengineering" finding aids to conform to EAD's data structure and to maximize user comprehension within the Web environment; this article may be the very best place for an archivist contemplating EAD implementation to begin reading.⁽²⁸⁾ The other five articles detail the software, hardware, and encoding choices made by particular institutions in the course of EAD implementation. The case studies may be particularly meaningful after you have read chapters 1 through 3 of these Guidelines, because the significance of the various institutions' choices will then be clearer.

1.7.2. Web Sites

Of the many World Wide Web sites containing useful information on EAD, these two are key:

The Encoded Archival Description Official Web Site, hosted by the Library of Congress, is the official source of the EAD DTD files. This site also includes background information on the development of EAD, instructions for subscribing to the EAD Listserv, and descriptions of major EAD implementation sites, including significant cooperative projects. The site is available at: <//www.loc.gov/ead/>.

The EAD Help Pages, maintained by the EAD Roundtable of the Society of American Archivists, contain a wide variety of useful information and links to other helpful sites. Specific items include tools and helper files, descriptions of the authoring and publishing software used by various EAD implementers, readings on SGML and XML, and an "I need help!" feature in which users can write for assistance with specific questions. The site is available at: <http://jefferson.village.virginia.edu/ead/>.

Other well-maintained Web sites focusing on EAD, SGML, and XML are listed in the bibliography in appendix G.

1.7.3. Training Opportunities

The Research Libraries Group began offering EAD workshops to its member institutions in 1996, and the Society of American Archivists did the same starting in 1997. These two-day workshops introduce archivists to the most important structural and content elements of EAD and include numerous hands-on exercises designed to enable graduates to return to their repositories and begin adapting their finding aids to fit within EAD. Some of the SAA workshops have open registration (these are advertised in SAA's bimonthly newsletter Archival Outlook⁽²⁹⁾ and other archival media), while others are sponsored by regional archival societies, local consortia, or individual institutions.⁽³⁰⁾ In addition, the University of Virginia offers a five-day EAD course as part of its annual summer Rare Book School program.⁽³¹⁾ Other organizations have sponsored EAD training courses as well.

As with most adult education, attending a workshop can be an exceptionally helpful way in which to begin learning a new standard, particularly one based in state-of-the-art information technologies. The combination of a well-informed instructor and a cadre of fellow students, eager to learn and to share their experiences, can serve both to demystify many aspects of EAD and to build confidence in your ability to succeed.⁽³²⁾ On the other hand, it is important to note that any complex standard takes time to master fully; a workshop can only give you the basics and get you started on the right foot. These Guidelines can help reinforce and expand on such instruction and lead you to additional resources to address your increasingly sophisticated learning needs.

Footnotes

Encoded Archival Description Tag Library, Version 1.0 (Chicago: Society of American Archivists, 1998).
ISAD(G): General International Standard Archival Description, adopted by the Ad Hoc Commission on Descriptive Standards, Stockholm, Sweden, 21-23 January 1993 (Ottawa, Ontario: International Council on Archives, 1994). Also available at: <http://www.archives.ca/ica/isad(g)e.html>.
Bureau of Canadian Archivists. Planning Committee on Descriptive Standards, Rules for Archival Description (Ottawa, Ontario: Bureau of Canadian Archivists, 1990).
See, for example: Daniel V. Pitti, "Encoded Archival Description: The Development of an Encoding Standard for Archival Finding Aids," American Archivist 60 (summer 1997): 268-83. Also see the background information on the Encoded Archival Description Official Web Site, available at: <//www.loc.gov/ead/>.
Steven L. Hensen, Archives, Personal Papers, and Manuscripts: A Cataloging Manual for Archival Repositories, Historical Societies, and Manuscript Libraries, 2d ed. (Chicago: Society of American Archivists, 1989).
Encoded Archival Description Tag Library, Version 1.0, 1-3.
This section is based in part on: Steven L. Hensen, " 'NISTF II' and EAD: The Evolution of Archival Description," American Archivist 60 (summer 1997): 284-296.
Library of Congress. Descriptive Cataloging Division. Manuscripts Section, National Union Catalog of Manuscript Collections (Washington, D.C.: Library of Congress, 1959-1993).
Philip Hamer, Guide to Archives and Manuscripts in the United States (New Haven: Yale University Press, 1961).
Important union lists were published in other countries as well. For example, the most comprehensive such publication in Canada was: Public Archives of Canada, Union List of Manuscripts in Canadian Repositories = Catalogue collectif des manuscrits archives canadiennes, edited by Robert S. Gordon (Ottawa: Public Archives of Canada, 1968).
Controlled access points consist of personal, family, corporate, and geographic name headings; topical subject headings; form and genre headings; etc., that are constructed according to standards or rules or drawn from an authorized thesaurus or list.
ISAD(G), rule I.2.
See appendix B for a specific data element-to-data element crosswalk between ISAD(G) and EAD.
Sections 1.3 through 1.5 are based in part on: Daniel V. Pitti, "Encoded Archival Description: The Development of an Encoding Standard for Archival Finding Aids," American Archivist 60 (summer 1997): 268-83.
For more information, see the Text Encoding Initiative Home Page, available at: <http://www-tei.uic.edu/orgs/tei/>.
The EAD Header <eadheader> is explained in section 3.6.1.
ISAD(G), rule 2.4.
The formal name of the XML specification is Extensible Markup Language.
The articles in these two issues also were reissued as: Jackie M. Dooley, ed., Encoded Archival Description: Context, Theory, and Case Studies (Chicago: Society of American Archivists, 1998).
Several of these articles served as the basis for sections of these Guidelines, as indicated in the relevant footnotes.
Dennis Meissner, "First Things First: Reengineering Finding Aids for Implementation of EAD," American Archivist 60 (fall 1997): 372-87.
Archival Outlook, (Chicago: Society of American Archivists), ISSN 1520-3379. Published six times each year.
For more information about hosting an EAD workshop, contact the SAA Education Office. Email: [email protected]. Phone: 312/922-0140. Fax: 312/347-1452. Mail: Society of American Archivists, 527 S. Wells Street, 5th floor, Chicago, IL 60607 USA.
Information is available online at: <http://www.virginia.edu/oldbooks/>.
Additional training issues are discussed from a managerial perspective in section 2.5.2.

Table of Contents

Home Page Preface Acknowledgments How to Use
This Manual Setting EAD
in Context Administrative
Considerations

Creating Finding
Aids in EAD Authoring EAD
Documents Publishing EAD
Documents SGML and XML
Concepts EAD Linking
Elements Appendices

Go to:

Table of Contents
Home Page	Preface	Acknowledgments	How to Use This Manual	Setting EAD in Context	Administrative Considerations
Creating Finding Aids in EAD	Authoring EAD Documents	Publishing EAD Documents	SGML and XML Concepts	EAD Linking Elements	Appendices