EAD Application Guidelines for Version 1.0

Chapter 5: Publishing EAD Documents

5.1. Overview of the Publishing Process
5.2. Resource Discovery
5.2.1. Listings on a Web Page
5.2.2. MARC Records Linked to Finding Aids
5.2.3. Full-Text Search Software
5.3. Resource Delivery
5.3.1. How Browsers Display Text
5.3.2. File Format Delivery Options SGML Format XML Format HTML Format
5.3.3. Stylesheet Languages Cascading Style Sheets (CSS) Extensible Style Language (XSL) Document Style Semantics and Specification Language (DSSSL) Format Output Specification Instance (FOSI) Proprietary Style Languages Stylesheet Examples
5.3.4. Printed Output from EAD Documents
5.4. File Management

5.1. Overview of the Publishing Process

Encoding a finding aid in EAD is the first important step in making the content of an archival collection electronically available to users; the encoded finding aid may then be "published." Three essential aspect of electronic publication of finding aids are addressed in this chapter:

5.2. Resource Discovery

A principal goal of EAD is to enable users to identify relevant collections and then to locate specific materials therein. As a first step, a repository must of course provide some means by which finding aids can be discovered, searched, and displayed. While there are a variety of techniques for dissemination and access, including compilations on CD-ROM or distribution via a client-server application over a local area network, the most common choice today is through the use of browser technology, either over the World Wide Web or via a local intranet. Three approaches, including combinations of the three, currently are in vogue for providing access to finding aids:

5.2.1. Listings on a Web Page

For repositories with ready access to a Web site, the simplest publishing solution is to create standard HTML-encoded Web pages that contain references to collections for which EAD finding aids are available. Such citations may be in the form of brief entries (as in a repository guide); may be embedded in a narrative or bibliographic essay on some aspect of the institution's holdings; or may be presented as simple lists of collections arranged by creator, repository, collection area, or subject matter. Current examples of this approach range from simple listings that enumerate the names of collections held by the institution, to groupings of holdings by broad subject areas, to listings that contain brief notes on the scope of each collection in the manner of an annotated bibliography. Finding aids may be accessed, one at a time, by selecting a hyperlink on the Web page that retrieves the EAD document and loads it to the user's browser. This scenario does not provide the capability to search the contents of multiple finding aids simultaneously, but once a user downloads a finding aid onto her computer, she can use the searching features of the browser software to perform keyword queries of the document's contents.

This option requires space on a Web site where a repository can load its files and the technical training necessary to design and create HTML-encoded pages.

Advantages: This approach is relatively easy to accomplish and inexpensive to carry forward if you have access to a Web site.

Disadvantages: The only information that the user initially has about the contents of the collection is the context that is provided on the page that contains the link to it. This may make the process of selecting relevant finding aids somewhat tedious.

5.2.2. MARC Records Linked to Finding Aids

Many U.S. repositories use MARC-based online catalogs as one avenue of access to their collections. A growing number of such catalogs now have Web interfaces and are another vehicle for making finding aids available to researchers.

The USMARC format includes field 856, which is used to record a citation to any external electronic resource, such as an EAD-encoded finding aid that is related to the collection described in that MARC record. Field 856 may include a uniform resource locator (URL) for that resource. Web interfaces to online catalogs render the URL as a hyperlink, which, when selected, displays the associated file, in this case a finding aid, in the user's browser. Repositories that use a Uniform Resource Name (URN) can record their globally unique identifier as a "handle" in the 856 field.

		856 42  $3 finding aid $d eadpnp $f pp996001 $g
				urn:hdl:loc.pnp/eadpnp.pp996001 $u

		856 42  $3 An electronic version of the inventory for this
				collection may be found at $u

	Figure 5.2.2a.  Two examples of the USMARC 856 field.
			The first contains a URN ($g) followed by a URL ($u) and the
			second only a URL ($u).

Advantages: The technical requirements for this approach are relatively modest and inexpensive if your online catalog has a Web interface. You will need to modify the relevant MARC record by adding the appropriate linking data in the MARC 856 field. Additionally, you will need space on a Web-accessible server to mount the EAD files.

This approach leverages existing indexing in the online catalog that captures information about the context and content of each collection. It mimics existing two-step finding aid systems in which the researcher first queries the catalog for broad name, place, or topical indexing of the collection, and is then directed to a finding aid in a separate file for more detailed information. With the introduction of MARC catalogs into many repositories, the first part of this process (consulting the online catalog) was automated. With online access to EAD files, this becomes an all-electronic process as the MARC record is linked dynamically to the online finding aid.

Disadvantages: This scenario presumes the availability of both MARC records for collections and a Web-accessible online catalog. The cost of creating the former or installing the latter just to make finding aids accessible may be quite expensive. It also creates an ongoing cost in the maintenance of the electronic links between the catalog entry and the EAD file (see section 5.4 for a discussion of file management issues).

While using the catalog as a gateway to one's finding aids provides many access points into the collection, users still cannot initially search the full text of the finding aid; whether this is a weakness or an advantage is a matter of perspective. One point of view holds that this is less than fully satisfactory, since full searching of the rich content of one or many finding aids is not possible. The other side argues that the summary catalog description, with access terms constructed in standardized forms, performs a useful filtering function, limiting search results to those aspects of the collection deemed sufficiently important to appear in a synopsis. For many reasons, this may be preferable to being bombarded with a large number of irrelevant hits such as a full-text search of large finding aid files may bring. The results of such a query may resemble the consequences of attempting to drink a sip of water from a gushing fire hydrant.

5.2.3. Full-Text Search Software

The most sophisticated and technically complex delivery method is to provide simultaneous, full-text querying of multiple finding aids by specialized search software. These tools permit the user to search the entire contents of many finding aids simultaneously, taking advantage of the EAD markup to facilitate requests for specific types of data, such as titles, dates, or names. The growing popularity of the XML standard is increasing the number of such systems that can handle EAD-encoded files.

The applications that fall into this broad group offer so many different features that they are difficult to categorize precisely. They include basic Web authoring and distribution products that offer search and retrieval capabilities, as well as complex, feature-rich document management systems, usually marketed to large corporations, that have numerous files and sophisticated publishing requirements. Many are quite expensive. Nevertheless, such systems may be suitable for large institutions with extensive holdings of electronic texts, or for consortia that share online services. Additionally, several firms offer substantial educational discounts, which can greatly reduce a repository's software costs.

The Enigma search engine from Insight, Inc., the InQuery search engine from the Center for Intelligent Information Retrieval, the BASIS text database from Open Text, and the Dual Prism software suite from AIS Software offer electronic publishing for SGML and XML files. Other applications include document management software such as the Livelink system from Open Text, Live Publish from the Folio Products Division of Open Market; Arbortext's Epic system; and INSO's DynaText family of products, which includes DynaTag and a Web publishing component called DynaWeb.

To provide access to collections in this manner, an archives must install the software that will index the files and format them for display on a Web-accessible server (see section 5.3 for information on the use of stylesheets in formatting documents). Each of these products includes a search interface that is partially to fully customizable by the archives. In a typical query, the user enters search terms that are passed to the server. The server executes the request and returns the results to the user's browser, typically listing brief information about each relevant collection. The user may then select and download the desired finding aids one at a time. This process is analogous to the brief listings of book titles that many online catalogs deliver in response to a search that results in multiple hits.

Advantages: The chief virtue of this method is that users can search simultaneously the full text of many finding aids, either from a particular repository or from multiple institutions, as in a union catalog. A query may reveal information about some portion of a collection that is significant to a particular researcher, but that does not constitute a sufficiently large part of the collection to have been noted in a summary MARC catalog record; many finding aids contain a wealth of information about subject content that the researcher can mine in this way. The search interface may be structured to permit retrieval based on specific content markup, such as enabling a search for all data encoded in a <corpname> element by prompting users to limit a search to "names of organizations." This approach provides easy, integrated access to many collections.

Disadvantages: Search engines are relatively expensive to acquire and require advanced computing skills to program and maintain. As mentioned in section 5.2.2, your ability to perform in-depth queries may increase recall but will certainly decrease the precision of many searches. The nature of the query interface, the type and level of indexing, and the presentation of the results set are additional theoretical and practical concerns that have not been clearly developed at this early stage in the implementation of EAD. More experience with retrieval issues will clarify user understanding of and requirements for both the search interface and the display of results, which will, in turn, help to refine these applications.

5.3. Resource Delivery

A key step in the find aid "publication" process is delivery of finding aid files to a user's browser in a form that the browser can display. A repository has several technical options for making this happen; the possibilities are determined by Web technology and involve a complex interplay of three related factors:

5.3.1. How Browsers Display Text

EAD markup designates only the structure and content of the document, not how it will appear on the page or computer screen; a specific method must be employed to format the file for display. The solution is a stylesheet, which is a set of instructions that governs how the EAD file will be formatted. This formatting may be applied to the finding aid either at the repository before it is sent to the user, or on the viewer's browser. In the latter instance, the stylesheet (which is simply a text file) is sent along with the EAD document to the browser, which then processes it.

Before learning about various stylesheet formats, we will examine how a browser might process text files encoded in SGML, XML, or HTML in order to display them correctly. In a typical scenario, the browser initially reads the encoding scheme of the document and interprets its structure as a tree, in which the document element (see figure 6.2.1b), such as <ead> or <html>, is the base of the tree. For example, the tree for an EAD document will have two or three major branches (<eadheader>, the optional <frontmatter>, and <archdesc>). The <archdesc> branch subdivides into <did>, <scopecontent> and other branches, which in turn further subdivide until one finally comes to the "nodes" or leaves at the end of each branch, where the textual content of the finding aid is found. For an example of how such a tree might be graphically represented, examine the Windows Explorer feature of the Windows operating system, which displays the hierarchical and nested relationships of the drives, directories, subdirectories, and files on your computer.

Once the tree has been built, the browser compares the file to its DTD to ensure conformance. A presentation engine in the browser then renders the text of each node for display, controlling properties such as screen placement and the size, type, and color of the font. In doing this, it follows certain rules embedded in a stylesheet.

The formatting rules (or stylesheet) for HTML documents are hardwired into the browser; in other words, they are included as part of the programming of the browser software. The display of each HTML element in Navigator or Internet Explorer is thus determined in advance by Netscape or Microsoft (each in a slightly different way). This is feasible because there are only about 80 HTML tags whose display must be predefined.

The default display may be overridden, however, by an external stylesheet file that causes the presentation engine in the browser to display the document in a different way. Such external stylesheets are optional for HTML files but are required when the document sent to the browser is encoded in XML or SGML; this is because the formatting of elements in these schemes is not built into the browser. Indeed, this formatting cannot be preordained, since the number of elements that could be defined by current and future DTDs in SGML or XML is virtually unlimited. A stylesheet therefore must be employed.

5.3.2. File Format Delivery Options

Just as there are multiple methods for authoring EAD-encoded finding aids, there are several ways in which these documents may be delivered electronically to users. One critical factor that differentiates these methods is the file format in which the document is transmitted, and three possibilities exist: SGML, XML, or HTML. The choice of stylesheet methodology is circumscribed by the file format decision. SGML Format
Inasmuch as EAD began as an SGML-based application, and EAD files are created and stored in SGML format, it would seem reasonable to deliver documents to users in their original EAD encoding. Unfortunately, software developers have chosen not to build SGML functionality directly into their Web browsers, due at least in part to the technical complexity that stems from the standard's great flexibility. Neither Netscape Navigator nor Microsoft Internet Explorer can interpret the structure of an EAD file in SGML syntax. However, a user can install another program, either Panorama Publisher from Interleaf (formerly sold by SoftQuad) or MultiDoc Pro from Citec, that works with the browser to display an SGML file. The reader's browser is configured so that when it receives an SGML file, the helper application is loaded into the browser, interprets the file, builds the document tree, and configures its display.

To deliver SGML manifestations of EAD-encoded finding aids on the Web, you must mount your EAD files on a Web-accessible server, along with the necessary stylesheets (87) and navigators and the EAD DTD files. In turn, users must have loaded either the Panorama or MultiDoc Pro software on their computers and have properly configured their browser to work with it. EAD documents are matched with specific stylesheets either through an association provided by a catalog file on the server (see section or by a processing instruction embedded in each finding aid that points to the relevant stylesheet file, also stored on the server. Processing instructions (PIs) are an SGML device for inserting into a document information that is intended for processing by a proprietary software application rather than by a parser.

Advantages: Panorama and MultiDoc provide a very effective presentation of the finding aid, including a useful navigator feature that provides a visual road map of the document, enhances user understanding of the collection, and aids in sophisticated searching that is built into the software. The presence of the entire document, with its full SGML structural encoding, on the user's computer permits fast and powerful searching based on content markup, as well as speedy navigation through the document once it has been downloaded.

Disadvantages: Unfortunately, unlike many other Web viewers and plug-ins, neither of these applications is available for free, but must be purchased by the user who wishes to display your finding aids. Since casual users may be unwilling to spend the time or money necessary to acquire the software, the usefulness of this scenario is diminished for general Web distribution. It may be more feasible in closed environments, such as a single archives, library or campus that can supply all users with the viewer (which could then be used not only for viewing finding aids, but also other SGML-encoded documents such as scholarly texts). Additionally, although the stylesheet language employed by each product is accessible through a robust editor, both are proprietary. Style specifications developed for this publishing environment will not be transferable to others. XML Format
XML was developed to provide the power and functionality of SGML on the Web, and EAD documents can be made XML-compliant (see section 4.3.2 for details). In this publishing scenario, the archives stores EAD-encoded finding aids in XML format (rather than in their native SGML format) on a Web-accessible server. Each EAD instance includes a processing instruction that points to the location of the stylesheet that is to be applied to its presentation. After the finding aid has been downloaded to the user's computer, the browser retrieves the referenced style file (written either in the CSS or XSL syntax; see section and section for more information) from the archives' server, and then uses its presentation engine to display the finding aid properly.

Advantages: As with SGML, the browser can take full advantage of the structural markup in EAD to effect fast and powerful searches of the document's content. Unlike the SGML scenario, however, no helper application is required because all the required functionality is included in the standard Web browser. The end user needs no special software.

Disadvantages: The chief drawback to this approach is that, at the time these Guidelines were written, XML functionality was as yet available only in Internet Explorer 5.0, although Netscape is building XML capability into the next release of its Navigator browser. Users with older browsers may use Panorama Publisher or MultiDoc Pro as a helper application to display XML documents in the same way that they process SGML files. It will be a number of years before a critical mass of Web users will be using newer browser versions that can read XML files directly. Until that time, archives will need to provide an alternative delivery method, probably in HTML. HTML Format
Given the justification provided in chapter 1 for encoding finding aids in SGML or XML, it may strike you as odd that HTML is suggested as a delivery format for EAD finding aids! But note carefully the use of the phrase "delivery format": HTML is a useful tool for the distribution and presentation of text and images, which are its intended purposes and strengths, despite its substantial shortcomings as a data storage format.

The experience of libraries and archives in disseminating MARC-encoded catalog records provides an informative analogy. While such institutions continue to appreciate the many virtues of creating, storing and searching catalog records in MARC format, they have quickly embraced Web interfaces to their online catalogs that deliver records to users in HTML format. No one has seriously suggested, however, that use of MARC be discontinued and that catalog data be created directly in HTML, since the result would be loss of the ability to search by specific types of data such as author, title, and subject. Such a move would seriously cripple user searching of collections, as well as the long-term viability of the catalog records as data rather than as undifferentiated text.

You can achieve the best of two worlds by encoding your finding aids in EAD and then using HTML as the vehicle for publishing them. You accomplish this by converting the markup of the finding aid from the EAD encoding scheme into HTML syntax. This process, technically referred to as "transformation," may happen at the repository in either of two ways.

Several of the publishing systems described earlier, including DynaWeb and Dual Prism, can generate HTML versions of a finding aid in real time at the moment that the user requests the file; this is known as dynamic transformation. Custom scripting, in programming languages such as Perl, works in conjunction with SGML-aware search engines to generate the HTML version "on the fly," with the script acting as a stylesheet to map data from one tag set to the other. Another software option in this category is Microsoft's Web server software (IIS), which can use a stylesheet written in the XSL language to transform an XML file into HTML, and then send the file out to a reader using its Active Server Page technology. As XML tools mature, such transformation into HTML may occur on both the user's computer and the repository's server.

Alternatively, a finding aid may be rendered into HTML code by the repository and stored on its Web server before any user requests the file. Currently employed conversion techniques include word processing macros, scripts written in Perl, and transforming software such as the Microsoft XSL Processor. The Internet Archivist authoring software has a built-in SGML to HTML converter. The problem with such a priori transformation is that one loses some of the functionality of the stylesheet. If a change in the document structure is required, the SGML-encoded master copy is updated, and the HTML version is regenerated. In this scenario, therefore, each finding aid must be individually reprocessed. With dynamic transformation, on the other hand, the results that the user gets on the browser reflect the most up-to-date version of the format without requiring that the individual documents be edited.

Advantages: Delivering finding aids as HTML solves the immediate problem that not all users can currently read SGML or XML files. HTML documents are accessible on any browser without additional effort by the researcher. Because standard HTML tags are used, no additional stylesheet need be generated.

Disadvantages: Unless the access and retrieval environment permits the user to search the original EAD-encoded document, the value of structured searching is lost. This searching limitation will exist as long as the file on the user's computer contains only the presentation markup of HTML. A less significant potential disadvantage is that staff will have to know both HTML syntax and the transformation language employed in order to implement this delivery option. Storing both an SGML or XML source file and an HTML presentation file for each encoded finding aid will also increase-perhaps double-the file storage space required on your server. Maintaining and updating two versions of each document is an additional expense, one that, complicates both file management and processing workflows.

5.3.3. Stylesheet Languages

There are several standardized "languages" for writing stylesheets, including Cascading Style Sheets (CSS), Extensible Style Language (XSL), the Document Style Semantics and Specification Language (DSSSL), and Format Output Specification Instance (FOSI). There also are style languages that are proprietary to particular software products. XSL and DSSSL may be used to transform files from one encoding scheme to another, as described above, in addition to serving as stylesheet languages.

While your repository will have many different finding aids, you probably will need only a few stylesheets, one for each style of finding aid that you produce (such as one for small collections and another for more complex finding aids, or one for paper-based collections files and another for microfilms). Considerable potential exists for cross-institutional sharing in the development and use of stylesheets, with repositories adopting, or borrowing and modifying, existing ones from a shared pool of models. The resulting standardization in finding aid appearance, both within and across repositories, might well enhance user comprehension and interpretation of these complex information tools; such sharing also would simplify the finding aid distribution process. Sharing of stylesheets would mandate, however, substantial agreement within and across archives as to the format in which finding aids are to be encoded and displayed. While this obviously involves decisions relating to layout on the screen or page, the inclusion or omission of particular EAD elements also will affect such sharing, especially of legacy data. Cascading Style Sheets (CSS) (88)
The CSS specification was originally developed as a way for Web authors to modify the "default" manner in which browsers display HTML files. The first version of CSS, called Level 1, focused on basic presentation issues such as margins, indention, and font characteristics such as size, weight, family, and color. Initial support for CSS was limited and inconsistent in Netscape Navigator and Microsoft Internet Explorer.

In May 1998, the more robust Level 2 version was approved by the World Wide Web Consortium (W3C) as an official Web Recommendation. Microsoft and Netscape both promise full support in their next software releases for Level 2, which features substantially richer formatting capabilities such as tables, as well as specifications for the output of print and screen displays.

Its functionality is straightforward. Once the browser creates the document tree, it applies CSS styles to elements in the order in which they appear in the document. These styles may be applied to either XML or HTML documents, either by embedding the styling specifications directly in the document, or by linking the encoded file to a separate stylesheet file via an HREF link or a processing instruction in the finding aid. Extensible Style Language (XSL) (89)
XSL was developed and adopted by the World Wide Web Consortium (W3C) especially for use with XML. It incorporates features of both CSS and DSSSL (see section The first iteration of XSL appeared as a W3C note in November 1997, followed by the first Working Draft release in August 1998. Final approval of XSL as a formal recommendation is not expected until summer 1999.

Supporters of XSL describe it as a more robust styling language than CSS, one that is intended to be employed in more complex presentation situations. Certainly XSL's pattern matching and formatting syntax is more sophisticated than CSS, though with a concomitant penalty in complexity. In addition to its styling functionality, XSL may also function as a transformation agent for the conversion of data from one syntax to another.

XSL also applies styles in a different way than CSS. Once the document tree has been constructed in the processor, XSL creates a second tree, the output tree; hence, the structure of the output can be different than that of the source. For example, you might decide to display the location <physloc> of an item before its title <unittitle>, even though <physloc> follows <unittitle> in the EAD instance. An XSL stylesheet can simply reorder the elements in the output tree without any alterations to the source document; styles are then applied to the output tree. This property of XSL also provides the capability to repeat the same data in two different parts of the display, such as by extracting headings to create a separate table of contents while also presenting the headings in situ throughout the finding aid. XSL accomplishes display either by using its own detailed format object specifications or by using the simpler display language of HTML. None of this is possible using CSS, which applies formatting directly to the document tree.

The relatively long approval schedule for XSL has not stifled the development of application software, including incorporation of XSL into Microsoft Internet Explorer 5.0. Several experimental tools are available, including XT and Jade from James Clark, the Koala XSL engine, and Microsoft's MSXSL "technology preview." Document Style Semantics and Specification Language (DSSSL) (90)
DSSSL was developed by the SGML community as an alternative to the existing proprietary, vendor-specific style languages that were once the norm in the SGML publishing industry. Although DSSSL has never been widely adopted, noncommercial applications such as James Clark's Jade program exist that use it to generate output in a variety of formats. Also, DSSSL has been formally adopted as an ISO standard. Like XSL, DSSSL may be used as a transforming language; DSSSL has served as the basis for many other aspects of XSL as well. Format Output Specification Instance (FOSI) (91)
FOSI was developed as a style language for use with the Department of Defense's CALS DTD, but it has not enjoyed wide popularity outside the defense industry environment. It is currently used in the Adept software marketed by Arbortext. Proprietary Style Languages
In addition to languages based on open standards, style specifications exist that are proprietary to particular software products. These include the style language developed by Synex Corporation that is used by Interleaf's Panorama Publisher and Citec's MultiDoc Pro software. If a repository delivers SGML files to users of these applications, it must create appropriate stylesheets (and navigator files, which generate "tables of contents" for the browser) using the interactive editor function built into Panorama Publisher and MultiDoc Pro. Proprietary stylesheet-like functions are also built into server software programs such as DynaText, DynaWeb, Livelink, and Balise, which perform transformations of SGML and XML files into HTML. Stylesheet Examples
Stylesheets written in CSS, XSL, or DSSSL syntax are textual documents consisting of a series of two-part rules. The first part of each rule defines the element or elements in the document to which the rule applies. This association may be based on the element's name, its position in the document, or its relationship to parent elements or subelements, attribute values, or other properties. The second component of each rule is an instruction that specifies some aspect of the display of the specified element, such as placement on the screen or page, font size, emphasis (bold or italic), or color.

The following examples show how stylesheet rules would be written in various languages to define the style for the display of the inclusive dates of the records in a finding aid. The stylesheet instructions specify that the dates of the archival materials are to appear on a separate line, in 12-point Times New Roman, colored navy, and prefaced by the text "Dates: ".

The first example uses an XSL rule to define display by use of XSL format objects: (92)

	<xsl:template match="archdesc[@level='collection']/did/
	<fo:block color="navy" font-size="12 pt" font-family="times new roman">

The second example uses an XSL rule to define display through the use of HTML formatting conventions. Technically, this is an XML to HTML transformation, since an XSL processor will generate HTML-encoded output as the result of applying this rule: (93)

	<xsl:template match="archdesc[@level='collection']/did/
		<P><FONT color="navy" face="times new roman" point-size="12">

The third example utilizes the conventions of the Cascading Style Sheets Level 2 specification:

	archdesc[level="collection"] > did >
	{color: navy; font-family: times new roman; font-size: 12 pt}

	archdesc[level="collection"] > did >
	{content:  "Dates:"}

5.3.4. Printed Output from EAD Documents

Even though EAD enhances user access to collections by creating electronic versions of finding aids, repositories will still likely need to produce printed copies for local patrons and staff, or for other uses. There are a variety of options for accomplishing this, depending in part on your choice of authoring environments. As noted earlier, native editors either include printing capabilities or require the use of a separate software package to produce finely formatted print copies.

SGML printing applications typically may be used with any SGML instance and are not restricted to files created by related authoring tools from the same vendor, though the two software packages may be closely bundled. Microsoft's SGML Author for Word can convert any SGML document into a Word file, in the same way that it executes a conversion in the opposite direction from Word to SGML. Stylesheet languages and other processors may also be utilized. Current applications include use of the DSSSL standard to generate print output from SGML applications. The capacity to control printing is included in both the CSS and XSL languages, though no implementations of either have yet appeared.

Institutions that create HTML manifestations of their EAD documents might use the HTML file as a source of print copy. This would not be done through the print function of the browser, which typically has limited formatting capabilities, but by importing the HTML file into a word processor (see section for more information on use of HTML files). Current versions of Word, for example, can import HTML documents, remove the tags, and convert the results into a word processing format. Lacking more robust solutions, one can simply remove the tags from the ASCII SGML file and format the document manually. Freeware programs are available in Perl that will strip out the markup from SGML documents. (94) The Internet Archivist authoring software can generate a simply formatted ASCII text output of an EAD file as well. Other authoring tools such as Author/Editor, ADEPT Editor, and XMetaL, as well as the browser plug-in Panorama Publisher, also can produce nicely printed copies of encoded finding aids.

5.4. File Management

As was true in the authoring process, effective local management of files is an important feature of a well-run EAD publishing implementation. The use and functionality of stylesheets, catalog files, conversion routines and other programs, and the sequence of internal workflow all must be documented. Version control of documents, including a written audit trail, will be essential as files pass though various manifestations.

Standardized file-naming conventions for internal storage, as well as techniques such as file-handling databases and purl resolvers or handle servers for maintaining persistent file names on the Internet, are critical both to the long-term sanity of the program administrator and to the accessibility of the files over time. Even the novice Web user has encountered the frustrating phenomenon of selecting a hyperlink on a Web page and receiving in return a message that the file could not be found. While there are many potential causes of this problem, one of the most frequent is that the creator has moved the file to another computer location without updating all links, resulting in broken connections.

A single online directory may hold many files, and therefore one promising solution is the creation of an index or other third-party device that stores the server and directory location of multiple files. This limits the information contained in any given hypertext link to a reference to a single persistent location (a server), where the storage details of many files are kept and may be updated simultaneously as filing systems change. This is made possible in SGML via a convention called the SGML catalog. Because images, text and other documents may be declared as entities and referred to by name rather than by address in an SGML document, it is possible to store the details about the actual computer location of these entities in an external and centralized catalog file. Unfortunately, XML has not incorporated this feature, but rather requires that all entities include both a relative entity name and a specific address in the form of a Uniform Resource Indicator (URI).

Other solutions might be employed, including purl resolvers and handle servers, both of which work in essentially the same manner.

The purl (persistent uniform resource locator) mechanism utilizes software developed by and freely available from OCLC,(95) and it functions in the following manner. When creating links in Web documents, the author embeds a purl in the document instead of using a conventional uniform resource locator (URL). The purl contains the Internet address of the purl server and a unique name for the document, image or other object that is referenced, instead of the document's absolute Internet address. When selected, the link sends a message to the resolver, which stores a full address for the external object and redirects the query to that location. In this way absolute Internet addresses are maintained on the resolver, where mass updates of information such as server and directory name changes are possible, rather than embedded in individual files, where any alterations in addressing would require the editing of many individual documents.

This approach does, however, have limitations. The purl software from OCLC runs only on a variety of UNIX platforms. Moreover, the purl naming convention does not effectively support links to specific locations within a document, only to the document as a whole.

Handles are one implementation of Uniform Resource Names (URNs) developed by the Corporation for National Research Initiatives (CNRI). (96) Handles are universally unique identifiers that are registered with a "naming authority," much in the same way that ISBNs are currently distributed and registered for published books. When a handle is used as a resource address in an encoded document, a handle server must resolve the handle, or unique identifier, into an actual address at which the desired resource can be located. Handles and handle servers are a relatively new development, and archivists interested in using this approach may obtain further information on The Handle System Web site.


  1. See section for more on Panorama style sheets. Both Panorama Publisher and MultiDoc Pro use the same style language.

  2. The World Wide Web Consortium provides further information about Cascading Style Sheets, available at: <http://www.w3.org/Style>.

  3. The World Wide Web Consortium provides further information about Extensible Style Language, available at: <http://www.w3.org/Style>. Information on this topic is also provided in Robin Cover's SGML/XML Web Page, available at: <http://www.oasis-open.org/cover/xsl.html>.

  4. The World Wide Web Consortium provides further information about Document Style Semantics and Specification Language, available at: <http://www.w3.org/Style>. Information on this topic is also available in Robin Cover's SGML/XML Web Page, available at: <http://www.oasis-open.org/cover/dsssl.html>.

  5. Robin Cover's SGML/XML Web Page provides further information about Format Output Specification Instance, available at: <http://www.oasis-open.org/cover/gov-apps.html#fosi>.

  6. The syntax used reflects the XSL Working Draft dated 16 December 1998; this may be modified prior to final adoption of XSL by the World Wide Web Consortium.

  7. The syntax of this example reflects the XSL Working Draft dated 16 December 1998; this may be modified prior to final adoption of XSL by the World Wide Web Consortium.

  8. For information on freeware Perl programs, consult the Perl Web site, available at: <http://www.perl.com>.

  9. Information regarding the purl strategy is available at: <http://purl.oclc.org>.

  10. For further information see CNRI's The Handle System Web site, available at: <http://www.handle.net/>.

Table of Contents
Home Page Preface Acknowledgments How to Use
This Manual
Setting EAD
in Context
Creating Finding
Aids in EAD
Authoring EAD
Publishing EAD
EAD Linking

Go to:

Copyright Society of American Archivists, 1999.
All Rights Reserved.

[VIEW OF LC DOME] The Library of Congress

Library of Congress Help Desk (11/01/00)