Some readers may find chapter 6 overly technical or confusing upon first reading. Chances are that some of its content will not immediately be useful when approaching EAD as a complete novice or with a limited understanding of basic computing technology. It may prove more helpful when revisited later on, after the reader grows more comfortable with the practicalities of EAD encoding. In the same way that you do not need to understand the chemistry of baking in order to make an apple pie provided you can identify the ingredients and follow the proper sequence in mixing them, most archivists will be able to create an EAD-encoded finding aid without a detailed understanding of either SGML or XML.
On the other hand, other readers may find this chapter overly basic if they need even more in-depth information about SGML or XML systems. These Guidelines provide ample citations to more comprehensive information resources that are available for those needing more detail than can be provided here (see the footnotes throughout this chapter, as well as the SGML/XML section of the bibliography in appendix G). H3>6.2. SGML Documents The SGML standard (ISO 8879) is a metalanguage that defines the components of an SGML document.(98) These components are as follows:
SGML data structures are based on the premise that the contents of a type of document, such as a finding aid, can be described as a series of hierarchies. For example, a collection may contain series, those series may contain files, and those files may consist of individual documents. In SGML, these parent-child relationships are expressed in the DTD. The relationship of elements to one another may be visualized as a tree with a root and central trunk subdividing into smaller and smaller branches with nodes at the furthest extremity of each. Figure 6.2.1b illustrates this concept using the beginnings of EAD's hierarchical structure. The base of this tree structure is referred to in SGML parlance as the document element. In an EAD document it is represented by the start-tag <ead>, which signals the beginning of the body of an encoded instance, the base of this tree.
The prolog of all EAD-encoded document instances must begin with the following document type declaration:
<!DOCTYPE ead PUBLIC "-//Society of American Archivists//DTD ead.dtd (Encoded Archival Description (EAD) Version 1.0)//EN">
The string <!DOCTYPE identifies this as a document type declaration, which should not be confused with the similarly named "Document Type Definition." The document type declaration references the DTD that is being used by the document instance. Immediately following the <!DOCTYPE string is the document type name, which is defined by the SGML standard as the minimum literal string that can stand uniquely as a representation of the document type being declared, in this case "ead." In an SGML system, the document type name may be either in upper or lower case. However, since XML is case sensitive and the EAD DTD prescribes that all element and attribute names be in lower case, it may be safest always to render it in lower case to be certain that your files will work in either context.
Having identified the document type, the instance must now make its DTD available to processing applications. This can happen in either of two ways. The entire DTD may be embedded within the DOCTYPE declaration itself in what is called the declaration subset. The content of the declaration subset is delimited at the beginning by a left square bracket ([) and by a right square bracket (]) at the end. (101) More frequently, the DTD is stored for efficiency in an external file, which the DOCTYPE declaration references at this point, as in the EAD DOCTYPE declaration shown above. The reference may consist of a formal public identifier whose text is preceded by the keyword PUBLIC. Alternatively, a location-specific system identifier may be given as a reference, signaled by the keyword SYSTEM. The relative merits of public identifiers, system identifiers, and a combination of the two are discussed in section 188.8.131.52.1.
A valid document instance is one that uses markup in the manner specified by the particular DTD that it references. XML permits well-formed document instances that do not reference a DTD, but that must still adhere to the rules in the XML specification. Archivists choosing to use the EAD DTD in XML mode will produce document instances that are both well-formed and valid.
In SGML-based systems, the process of testing a document for compliance with the referenced DTD is called "parsing." Parsing is the process of resolving something complex into its component parts. The parsing application in an SGML system goes through several steps to accomplish this. In a typical scenario, it first reads the SGML declaration if there is one and then tests the DTD itself for SGML conformity. Next it reads or parses the markup, expanding text entities and separating text from markup. The document is then transformed into a tree structure so that in the final phase the application can validate the document by comparing its structure to that of the DTD. Simply stated, parsing identifies the markup, and validation compares that markup against the DTD.
The element declaration in a DTD is composed of the string <!ELEMENT, followed by an element name, followed by a content model, as in the following example:
<!ELEMENT element_name content_model>
The element name forms the core of each tag that delineates the content of an element in an individual instance of an encoded document. The SGML standard specifies that tags are identified by a left angle bracket (<), followed by the text of the element name and a trailing right angle bracket (>), and that there are two possible tags for each content-bearing element: a start-tag and an end-tag. In their simplest form these tags are identical, with one important exception: the end-tag includes a forward slash (/) between the left angle bracket and the element name.
Start-tag: End-tag: <element_name> </element_name>
The start-tag can have a more complex form because it can be qualified by attributes (discussed in section 6.4), while end-tags cannot be so modified.
The following examples contain two fairly simple element declarations from the EAD DTD to illustrate the relationship between element declarations and the tags used in an individual instance of an encoded document. Figure 6.3a contains the EAD DTD declarations for the elements EAD <ead> and Change <change>:
<!ELEMENT ead (eadheader, frontmatter?, archdesc)> <!ELEMENT change (date, item+)> Figure 6.3a. Element declarations taken from the EAD DTD.
The element name for each element as declared in the EAD DTD is used to construct tags in individual EAD document instances, as illustrated in this example:
Start-tag: End-tag: <ead> </ead> <change> </change>
The content model section of an element declaration in an SGML DTD specifies three characteristics of the element:
The order of occurrence in the content model sequence of the element declaration is determined by the following symbols:
|,||Comma||Required order (x then y)|
||||Vertical bar (pipe)||No required order (x or y)|
|( )||Parentheses||Content groupings within |
the broader content model
The permitted frequency of occurrence of subelements is established by the following frequency indicators, which can appear immediately after the name of a particular element or after a content grouping delineated by a set of parentheses:
|Blank||Required, may only occur once|
|+||Plus||Required, may occur one or more times|
|?||Question mark||Optional, may occur only zero or one times|
|*||Asterisk||Optional, may occur zero, one or more times|
Both elements as declared in figure 6.3a are examples of a content model that can only contain other elements, also referred to as subelements. The first element declaration in figure 6.3a states that the element Encoded Archival Description <ead> must contain a single required subelement EAD Header <eadheader>, possibly followed by a single optional subelement Front Matter <frontmatter>, followed by a single required subelement Archival Description <archdesc>. The second element declaration states that the element Change <change> must contain a single required subelement Date <date>, followed by one or more required Item <item> subelements (see chapter 3 for a discussion of the use of these and other EAD elements and for examples of their use in encoded finding aids).
The above element declaration examples are illustrative of an element-only content model, which means that these elements can have only other elements as content. Obviously all elements declared in the EAD DTD cannot follow this content model, since we must be able to put the textual data that comprises an archival finding aid somewhere within individual EAD-encoded document instances. An SGML-based encoding scheme such as EAD uses the term PCDATA to indicate that "parsed character data" is allowed in the content model for an element. You may not think of the text of your finding aid as "parsed character data," but that is what it is to an SGML-aware software package once your finding aid text is part of an EAD-encoded document. Any text that the content model for an element defines as PCDATA must be parsed by the software in order to determine that it is not markup. The software cannot assume automatically that this element content is or is not markup, and so it must resolve (analyze) its component parts.
SGML-based content models use the term CDATA (character data) to indicate to processing software that data allowed in certain places will never contain markup and therefore does not have to be parsed in order to be validated. One common example of the use of CDATA that will be discussed in section 6.4 is supplying attribute values, which can never contain other markup or character entities.
The Abstract <abstract> element provides an illustration of a mixed content model that can contain both textual data and other elements or markup. The following example illustrates the content model for <abstract> as established in the EAD DTD:
|<!ELEMENT||abstract||(#PCDATA | ptr | extptr | emph | lb | abbr | expan||
ref | extref | linkgrp | bibref | title | archref)*>
This element declaration states that <abstract> has no required content and can contain parsed character data or any of the enumerated elements in any order and as many times as they are needed. In an EAD document, this means that you can put text and the enumerated tags in any combination that you wish between the start-tag <abstract> and the end-tag </abstract> (see section 184.108.40.206.6 for more information on the use of <abstract>).
An SGML-based markup system can define either content-bearing elements, as discussed above, or empty elements. The vast majority of elements defined by the EAD DTD contain either other elements or text. Several, in what is known as "mixed content," contain both. A few others, such as the element Pointer <ptr>, are defined as EMPTY. The declaration for this element specifies the following content model:
<!ELEMENT ptr EMPTY>
Empty elements can contain no textual or element content and are tagged using only the start-tag, not the end-tag. SGML processing software knows not to look for an end-tag (see section 220.127.116.11 for information on XML syntax for empty elements). The principal value of the capacity to declare empty elements in an SGML system is to gain access to the attributes available for those elements; this is especially useful for elements that facilitate both internal and external linking from a particular spot in a document. You might use such an element for creating cross references between different points within an encoded finding aid or to create a link to digitized facsimiles of items from the collection described in a finding aid. Chapter 7 contains encoded examples and a more in-depth discussion of EAD's linking features.
Attributes are always related to a previously declared element in a DTD. In other words, the SGML standard does not permit an attribute to stand alone without an element for which it provides some qualification. Attributes provide metainformation about the data content delineated by a particular element and can only appear after the element name within start-tags in an SGML-based encoding system. Attributes are placed into start-tags in the following manner:
<element_name attribute_name="attribute_value"> For example: <c level="series">
For example, the Component element, delineated by the start-tag <c> and the end-tag </c>, provides, through its subelements and the textual data they contain, a variety of identification and contextual information about a particular component in an archival description. The LEVEL attribute on that component is the means through which the EAD DTD provides for the encoding of metainformation about the level of a particular descriptive component (collection, series, file, item) within the larger encoded archival description.
An attribute declaration in a DTD is composed of the string <!ATTLIST, followed by an element name indicating the element that the attribute will be modifying, followed by the attribute name, followed by the specification of allowable attribute value(s), followed by either a default value or an attribute type.
<!ATTLIST element_name ATTRIBUTE_NAME value(s) type_or_default>
Figure 6.4a provides two examples of EAD element and attribute declarations:
1. <!ELEMENT editionstmt (edition | p)+> <!ATTLIST editionstmt id ID #IMPLIED altrender CDATA #IMPLIED audience (external | internal) #IMPLIED encodinganalog CDATA #IMPLIED> 2. <!ELEMENT archdesc (runner*, did, (admininfo | bioghist | controlaccess | odd | scopecontent | organization | arrangement | add | dsc | dao | daogrp | note)*)> <!ATTLIST archdesc id ID #IMPLIED altrender CDATA #IMPLIED audience (external | internal) #IMPLIED type (inventory | register | othertype) #IMPLIED othertype CDATA #IMPLIED level (series | collection file | fonds | item | otherlevel | recordgrp | subgrp | subseries) #REQUIRED otherlevel CDATA #IMPLIED langmaterial CDATA #IMPLIED legalstatus (public | private | otherlegalstatus) #IMPLIED otherlegalstatus CDATA #IMPLIED encodinganalog CDATA #IMPLIED relatedencoding CDATA #IMPLIED> Figure 6.4a. EAD DTD element and attribute declarations for <editionstmt> and <archdesc>. The three columns in the attribute declarations represent, in order, the name of the attribute, the content model, and the value designation.
The LEVEL attribute for the Archival Description <archdesc> element provides a good example of both constrained and unconstrained values (see the second attribute declaration example in figure 6.4a). The attribute declaration provides the following closed list of possible values for this attribute, thus constraining the choices an archivist can make: collection, file, fonds, item, otherlevel, recordgrp, series, subgrp, subseries. Providing a closed list of values makes inputting those values easier in SGML-aware authoring systems (see chapter 4 for more information on EAD authoring) and ensures consistency of attribute values across repositories for certain important types of information that may be crucial to union databases of encoded finding aids. There are, however, other legitimate names that some repositories may use for levels of archival description. While the list above is closed, one of the choices is OTHERLEVEL, another attribute that is declared in the DTD with a content model of CDATA, meaning that its value is unconstrained (see section 3.5.1 for a discussion of encoding the LEVEL attribute in <archdesc>).
When attribute values are encoded within tags, they are treated by an SGML-aware processing system as literal values. This term denotes a string of characters enclosed between either single (') or double (") quotation marks that will not be broken down further for processing. For example, an encoder cannot use an entity reference as the content of an attribute value and expect that the processing software will recognize and resolve that entity (entities are discussed in section 6.5).
Attribute values can be acted upon by SGML-aware processing software in a variety of ways:
In an individual document instance it is important to remember that, unlike the textual data that is the content of many elements, the actual data values of attributes are not immediately available to the end user of that encoded document instance. In order that certain attribute values display to an end user (for example, the LANGMATERIAL attribute value of <archdesc> or the LABEL attribute value of a <did> subelement), you must be using a system or stylesheet that can act upon attribute values and transform them for display (see section 5.3.3 for more on stylesheets).
From a technical perspective, the most striking difference between attributes and elements is the fact that elements, as we have seen, can contain either text or other elements, while attributes can never contain other elements. Attribute values, as previously noted, are expressed in terms of CDATA rather than PCDATA so that an SGML-aware processing software package attempting to validate an encoded document instance will never have to parse those attribute values in search of further markup. Also, as you can see in the attribute declaration examples in figure 6.4a, there is no means in the attribute declaration to control the order in which attributes should occur. In an EAD-encoded document, you may therefore place the declared attributes for any given tag in any order you wish, while elements must be encoded in the order (if any) specified by the element declarations in the DTD.
The examples in figure 6.4a illustrate the variety of attribute values that can be declared in a DTD. The attribute ALTRENDER-which allows an encoder to indicate output rendering preferences to processing software-has a value designation of CDATA, meaning that it is unconstrained and an encoder therefore can assign it whatever value is needed. The attribute AUDIENCE, on the other hand, is constrained to one of two values supplied in a closed list, "internal" or "external." The attribute ID has a value designation of ID, which is an SGML term for a string of characters that begins with an upper- or lower-case letter, contains no whitespace, and is composed only of alphanumeric characters, underscores (_), hyphens(-), colons(:) and full stops(.) A further requirement for attributes with an ID value designation is that their value must be unique within a particular encoded document instance and that there can only be one such attribute per element.
Attribute types are specified at the end of each attribute declaration line in the <!ATTLIST examples in figure 6.4a. The vast majority of the attributes in the EAD DTD are declared as IMPLIED, meaning that individual SGML-based systems may imply values for them if not otherwise declared, but that the DTD will not enforce their occurrence. You will notice in the <!ATTLIST declaration example for <archdesc> that the attribute LEVEL is declared as REQUIRED, which means that a parser will not validate an EAD instance when this attribute is missing.
Once an entity has been declared within a document instance, the encoder can use the abbreviated name as many times as necessary. Processing software, when encountering the abbreviated name, will expand the abbreviation to whatever the entity declaration references. How the entity expansion behaves is chiefly determined by the processing software, but an encoder can often use markup to provide some direction to the software. This is discussed at greater length in chapter 7 on linking elements.
Declaration: <!ENTITY tp-address PUBLIC "-//ABC University::Special Collections Library//TEXT (titlepage: name and address)//EN" "tpspcoll.sgm"> Expansion: <list type="simple"> <head>Repository Address </head> <item>Special Collections Library</item> <item>ABC University</item> <item>Main Library, 40 Circle Drive</item> <item>Ourtown, Pennsylvania</item> <item>17654 USA</item> </list> Figure 6.5a. An example of an entity declaration followed by the entity expansion.
Goldfarb and Prescod provide a helpful, if perhaps oversimplified, analogy for entities. They suggest thinking of an entity as a box with a label. The box contains some specified text or data, while the label (the abbreviated name) offers a shorthand way of referring to the box. (102) Entities can range from simple to complex, but they provide powerful ways to increase efficiency, avoid redundancy, and incorporate non-SGML data into encoded document instances.
Entities can be one of two types: parameter or general.
<!ENTITY % entity_name entity_value>
Parameter entities must be declared in DTDs before they can be referenced elsewhere in the text of the DTD. A DTD writer would reference a parameter entity as follows:
The following example provides an illustration of a parameter entity declaration taken from the EAD DTD:
<!ENTITY % a.common 'id ID #IMPLIED altrender CDATA #IMPLIED audience (external | internal) #IMPLIED'>
You may notice from prior attribute examples that the literal single-quoted string above looks suspiciously like the contents of an <!ATTLIST declaration, which it is. This entity declaration example basically states that wherever an application reading this DTD encounters the reference "%a.common;" it should substitute the attribute list that appears between the single quotation marks in the above example. This particular parameter entity is referenced frequently throughout the EAD DTD, since the attributes ID, ALTRENDER, and AUDIENCE are available for the modification of the majority of EAD elements. The very first element declared in the "Encoded Archival Description Element Declarations" section of the EAD DTD appears as follows:
<!ELEMENT ead (eadheader, frontmatter?, archdesc)> <!ATTLIST ead %a.common; relatedencoding CDATA #IMPLIED>
At the point at which an SGML-aware software package encounters this parameter entity reference, it will expand the reference so that an encoder utilizing an <ead> tag actually has four available attributes for that tag, RELATEDENCODING plus the three defined by the "a.common" entity declaration. This expansion of the entity reference is one of the steps all SGML-aware software packages must take prior to processing any SGML-compliant encoded text.
There is one final point to make about the above example that illustrates an important property of entities. The example illustrates an internal entity declaration, which means that the text for the expansion of the entity reference is declared as part of the entity declaration itself. If that declaration had referenced a text or data file stored outside of the file in which the reference is included (we will see an example of this shortly), it would be an example of an external entity declaration.
The percent sign (%) in both the entity declaration and at the beginning of the entity reference in the examples above is what identifies these as parameter, and not general, entities.
Added Latin 1 Monotoniko Greek Added Latin 2 Diacritical Marks Greek Symbols Numeric and Special Graphics Alternative Greek Symbols Publishing Greek Letters General Technical
Note that the character entity set references in the EAD DTD do not by themselves make these character entities available for use in EAD instances. You must have these character sets available in your SGML system in order to make them work in EAD. This is a topic that should be discussed with your system administrator if you plan to use character entities in your encoded finding aids.
In SGML, special typographic and graphic characters are represented with SDATA (specific character data) entity declarations, which provide the abbreviations you will use in character entity references within your encoded documents. These entity declarations appear in the SGML ISO character entity mapping tables that must be a part of your system if you are going to reference nonkeyboard characters. In your EAD instances you can reference individual character entities in the following manner:
As an alternative to the SDATA abbreviation you can also use the decimal number assigned to the character in the ISO character entity set in use, for example:
Character Entity Reference Desired Character or Symbol ISO SDATA Abbreviation ISO Decimal Reference © copy; ©
XML uses the functionality of Unicode to replace SGML's SDATA-based character entity scheme. Unicode is designed to be all encompassing, incorporating all diacritics, symbols and characters into a single character entity set.
As explained by Goldfarb and Prescod:
If you are a native English speaker you may only need the fifty-two upper- and lower-case characters, some punctuation, and a few accented characters. The pervasive 7 bit ASCII character set caters to this market. It has just enough characters (128) for all of the letters, symbols, some accented characters and some other oddments. ASCII is both a character set and a character encoding. It defines what set of characters is available and how they are to be encoded in terms of bits and bytes. XML's character set is Unicode, a sort of ASCII on steroids. Unicode includes thousands of useful characters from languages around the world. However, the first 128 characters of Unicode are compatible with ASCII and there is a character encoding of Unicode, UTF-8, that is compatible with 7 bit ASCII. This means that at the bits and bytes level, the first 128 characters of UTF-8 Unicode and 7 bit ASCII are the same. This feature of Unicode allows authors to use standard plain- text editors to create XML immediately. (104)
A character or symbol can be referenced using either the decimal number assigned to it in Unicode or its hexadecimal alphanumeric reference in the following manner:
Decimal: somenumber; Hexadecimal: somenumber; Character Entity Reference Desired Character or Symbol Unicode Decimal Reference Unicode Hexadecimal Reference © © ©
SGML and SGML systems as they currently exist do not recognize the hexadecimal alphanumeric references, though XML systems do. Furthermore, SGML systems only recognize the Unicode numeric references for the 128 7-bit ASCII characters. Work is currently underway to alter the SGML standard to fully recognize the Unicode character entity set. EAD implementers using SGML software should use the ISO SDATA abbreviations when including character entity references in their EAD instances. When XML-compliant mapping tables become available, it will be easy to swap these for the SGML ISO tables in the system without necessitating any markup changes.
One other point worth mentioning is that although special characters can be included in EAD documents using SDATA abbreviations or decimal and hexadecimal references, many search engines cannot search these entity references, which may cause searches to fail. Until the time when Unicode becomes a standard in use in all of the various software packages utilized in encoding, manipulating, and delivering encoded instances, there will be disparities in how different software expresses special characters. A repository must consider the importance of being able to index and display these special characters in the light of the difficulty in maintaining them through the various stages of the markup and delivery of encoded finding aids.
Both the DOCTYPE and ENTITY declarations shown in figure 18.104.22.168a contain quote-delimited external identifiers. External identifiers can either be public or system identifiers. Public identifiers provide a form of destination address that is not specific to any one system. Use of a public identifier relies on the SGML system to resolve the nonspecific address to a specific one where the referenced file can be found. Public identifiers are provided for in the SGML standard (ISO 8879), while the syntax for Formal Public Identifiers (FPI) (105), a subset of public identifiers, is specified in the ISO 9070 standard. EAD does not require that all public identifiers be FPIs.
If a public identifier is used (indicated by the keyword PUBLIC), it may be followed by a system identifier in the form of a Uniform Resource Indicator (URI) for the resource. A URI is a broader construct that includes as a subset the more familiar URL (Uniform Resource Locator). (106) URLs, and more broadly URIs, are the addressing mechanisms that facilitate pointing to resources on the World Wide Web. The keyword SYSTEM, instead of PUBLIC, will precede an external identifier when only a URI, and no public identifier, is given for the file being referenced. Both keywords are important components in document type declarations and external entity declarations. The relative merits of public identifiers, system identifiers, and a combination of the two are discussed in section 22.214.171.124.1.
Entities like the one shown in figure 126.96.36.199a will be discussed shortly in greater detail. Once an entity has been declared in the declaration subset of a document instance, it can be referenced in the document instance itself at any point where markup is permitted by the DTD (it cannot be used in a place where content has been declared as CDATA). The entity declared in figure 188.8.131.52a would be referenced as follows:
<!ENTITY entity_name "specification_of_content">
Note that the keywords PUBLIC and SYSTEM are not necessary in internal entity declarations. Internal entities are only useful for text that is used repetitively within a particular encoded finding aid; they cannot be referenced from other document instances. For example, a repository might utilize a general internal entity in encoding a finding aid in which the name of the organization whose records are described in the finding aid is long, complex, and subject to typographical errors. In such a case, the encoder might declare an entity in the document type declaration as follows:
<!DOCTYPE ead PUBLIC "-//Society of American Archivists//DTD ead.dtd (Encoded Archival Description (EAD) Version 1.0)//EN" [ <!ENTITY stuffsoc "Society for the Preservation, Beautification, and General Betterment of the Stufftown Memorial Stuffatorium"> ]>
The encoder could then, at multiple points throughout the EAD instance, reference this declared entity as follows:
<origination><corpname>&stuffsoc;</corpname></origination> [...] <prefercite><head>Preferred Citation</head><p>[identification of item], &stuffsoc; Records, Stufftown Memorial Stuffatorium, Stufftown, NS.</p></prefercite> [...] <bioghist><p>The &stuffsoc; was established in 1872 by the town council of Stufftown and was endowed initially with a fund to cover operating and acquisition expenses through the generous benefaction of ... </p></bioghist>
Any SGML-aware software encountering these entity references in processing can expand them to the full text provided in the entity declaration prior to processing the encoded instance.
The next two sections describe the details of the entity declarations that are used to specify parsed or unparsed data.
Using entities in this way can assist a repository in the management of frequently updated information that appears widely across its encoded finding aids. Instead of entering such information as a part of each EAD instance, it can be stored as a separate file and referred to from within each instance using an entity reference. Figure 184.108.40.206.1a below provides an example of contact information that is stored as a separate file, tpncdsp.sgm, so that it can be referenced as an entity in each of the repository's finding aids. Updating this single file when, for example, the repository's area code changes, would change the contact information in all of the repository's encoded finding aids. If the area code information had been hard-coded into each of the individual files, such an update would be much more labor-intensive.
<list type="simple"> <head>Contact Information </head> <item>Rare Book, Manuscript, and Special Collections Library</item> <item>Duke University</item> <item>P.O. Box 90185</item> <item>Durham, North Carolina</item> <item>27708-0185 USA</item> <item>Phone: 919/660-5822</item> <item>Fax: 919/660-5934</item> <item>Email: firstname.lastname@example.org</item> <item>URL: http://scriptorium.lib.duke.edu/</item> </list> Figure 220.127.116.11.1a. The content of the file tpncdsp.sgm.
The following is an example of an entity declaration that references the file tpncdsp.sgm using only a public identifier:
<!ENTITY tp-ncd-spcoll PUBLIC "-//Duke University::Rare Book, Manuscript, and Special Collections Library//TEXT (titlepage: name and address)//EN">
This form of entity declaration is valid only in SGML systems. In XML a system identifier must be supplied as well, as illustrated in the following example, in which a relative URI is used:
<!ENTITY tp-ncd-spcoll PUBLIC "-//Duke University::Rare Book, Manuscript, and Special Collections Library//TEXT (titlepage: name and address)//EN" "tpncdsp.sgm">
Finally, the information in figure 18.104.22.168.1a could be declared as an entity using only a system identifier. This approach would also be valid in XML. In the example below the system identifier is given as an absolute URI:
<!ENTITY tp-ncd-spcoll SYSTEM "http://scriptorium.lib.duke.edu/eadfiles/tpncdsp.sgm">
The choice of whether to use a system identifier (a relative or absolute URI) or a public identifier (either an FPI or a less formal local public identifier) is largely determined by the system or systems in which you are storing and delivering encoded finding aid data (see section 5.4 for a related discussion of file management). Use of a relative URI, one that does not give the entire address of the referenced file beginning with the transfer protocol used (such as http://), assumes that all referenced files will inhabit a stable directory structure regardless of the server on which they reside. It should be noted here that use of relative URIs may be problematic for managers of union databases of encoded finding aids; archivists planning to submit their finding aids to such collaborative projects should consult the systems manager of the union database prior to deciding to use relative URIs. Using an absolute URI commits you to some file maintenance overhead anytime the files being referenced are moved to a new server, since the entire address of each URI will have to be edited. However, a simple "find and replace" routine will probably alleviate this overhead in most cases.
Use of a public identifier assumes the existence of an SGML catalog file (107) to which a system can turn in order to map, or resolve, that public identifier into a URI. The great strength of a file management system based on public identifiers is that changes in file locations on or among servers can easily be accommodated by changing a single address in the catalog file, rather than changing the entity reference in each individual EAD instance. Planning for future storage and delivery system possibilities requires careful thought as you decide which addressing approach to adopt. A fuller discussion of various options for providing file addresses in external entity declarations is provided in section 7.5.
There is an important difference between referencing files containing encoding excerpts, such as the one illustrated in figure 22.214.171.124.1a, in SGML systems as compared to XML systems. In the former, any chunk of text can be excerpted for reuse provided that it is encoded using the same DTD as the document instance in which the file will be referenced as an entity. XML imposes the additional requirement that the excerpted chunk of encoded text be "well-formed," meaning that it must have a single parent element. The example in figure 126.96.36.199.1a meets this XML requirement, since the tags <list> and </list> enclose all of the other information in the file. If the <list> and </list> tags were removed, however, this file would no longer satisfy the well-formed requirement in XML.
All general external entity declarations, regardless of whether they utilize a public or a system identifier, can be referenced in encoded document instances in the same way that the general internal entity was referenced earlier (see section 188.8.131.52), as shown in the following example:
<publisher>Rare Book, Manuscript, and Special Collections Library<lb> Duke University<lb> Durham, North Carolina</publisher> &tp-ncd-spcoll;
The SGML standard specifies that entity declarations not intended to be parsed use the keyword NDATA followed by a notation name to indicate to application software the type of external file that the entity declaration references. A notation declaration, either in the DTD or in the prolog of an EAD instance, specifies a notation name and a formal public identifier (FPI) for the type of NDATA file used and must occur before you can declare a general external entity. The file endnotat.ent, one of the files affiliated with the EAD DTD, includes a series of standard notation declarations. This provides notations for SGML as well as a number of common non-SGML data types, including HTML, JPEG, MPEG, XML, PCX, GIF, and TIFF. Because these notations are declared in the EAD DTD, you can declare external entities in the prolog of your document instance that reference such files. Each notation declaration contains a notation name that you will need to create entity references to that type of file. Notation declarations follow the same format as general external entity declarations. Figure 184.108.40.206.2a illustrates the notation declaration for GIF files from the EAD DTD:
When encoding your repository's finding aids, you may wish to include a GIF image of your repository seal or of the repository itself. This can be done by creating a general external entity declaration in the declaration subset of the prolog of each EAD instance. This entity declaration, as previously noted, must provide an address for the image file on your server using a public identifier, a system identifier, or both. A general external entity declaration, using as a system identifier an absolute URI, for the purpose described above might look like this:
<!ENTITY lcseal SYSTEM "//lcweb2.loc.gov/sgmlstd/panorama/lcseal.gif" NDATA gif>
The following is an example of an entity declaration for the same purpose that uses both a public and a system identifier, in this case a relative URI:
<!ENTITY dukeseal PUBLIC "-//Duke University::Rare Book, Manuscript, and Special Collections Library//NONSGML (dukeseal)//EN" "dukeseal.gif" NDATA gif>
By including the NDATA keyword following the URI, you are signaling SGML-aware processing software that the referenced file contains data that should not be processed with your EAD instance. By providing the declared notation name for a data type after the NDATA keyword, you are further giving the processing software a clue about how it might handle this data that it is not supposed to parse.
Because this entity is external to your document instance and not intended to be parsed as part of it, you would not refer to it in your document using the direct entity reference format discussed in section 220.127.116.11.1. Instead, you would use one of EAD's linking elements with an ENTITYREF attribute to provide a reference to the external entity (see section 7.3 for a fuller discussion of EAD's external linking elements and attributes).
|Table of Contents|
|Home Page||Preface||Acknowledgments||How to Use
Aids in EAD
|SGML and XML
The Library of Congress