LC METS Profile
METS Profile for Historical Newspapers [Draft]
XML Version of Profile Document
Abstract
The METS Profile for Historical Newspapers specifies how METS documents representing digitized historical newpapers should
be encoded. Note that the profile is to be used to represent a single issue of a newspaper. The profile uses MODS to express
the logical structure of a newspaper issue, and uses the METS structMap to express the physical structure of the newspaper
issue.
URI
http://www.loc.gov/standards/mets/profiles/LC_profiles/newspaper/00000010.xml
Date
2006-02-06T17:31:00
Contact
Morgan V. Cundiff
Network Development and MARC Standards Office, Library of Congress
101 Independence Avenue, Washington DC, 20540
Related Profile
No related profiles.
Extension Schema
Metadata Object Description Schema (MODS)
http://www.loc.gov/mods/v3/
Extension Schema
Metadata for Images in XML (MIX)
http://www.loc.gov/mix/
Extension Schema
Premis
http://www.loc.gov/standards/premis
Description Rules
It is recommended that the content data used in the MODS section of a METS document conform to AACR2 to the extent that it
applies.
Controlled Vocabularies
Library of Congress Subject Headings
Library of Congress
No URI established.
Controlled subject headings used in the MODS section of a METS document must be formulated according to the Library of Congress
Subject Headings (LCSH).
Library of Congress Classification
Library of Congress
http://www.loc.gov/catdir/cpso/lcc.html
NACO Authority File
Library of Congress
http://authorities.loc.gov/
Controlled name and title headings used in the MODS section of a METS document must be formulated according to the NACO Authority
File.
MARC Country Codes
Library of Congress
http://www.loc.gov/marc/countries/
Country codes used in the MODS section of a METS document must use MARC country code list.
ISO 639-2 Language Codes
Library of Congress
http://www.loc.gov/standards/iso639-2/
Language codes used in the MODS section of a METS document must use ISO 639-2 bibliographic codes.
MARC Relator Codes
Library of Congress
http://www.loc.gov/marc/relators/
Relator codes or terms used in the MODS section of a METS document must use MARC relator list.
Target Audience Codes
Library of Congress
http://www.loc.gov/marc/sourcecode/target/
Terms used in the MODS section of a METS document must use the MARC target audience list.
MARC Code List for Organizations
Library of Congress
http://www.loc.gov/marc/organizations/
Source codes used in the MODS section of a METS document must use MARC organization list.
Structural Requirements
METS Root Element
metsRootElement Requirement 1
The METS root element must have an attribute PROFILE="http://www.loc.gov/mets/profiles/00000010.xml".
Example 1.
<mets:mets PROFILE="http://www.loc.gov/mets/profiles/00000010.xml">
<mets:structMap>
<mets:div />
</mets:structMap>
</mets:mets>
Descriptive Metadata Section
dmdSec Requirement 1
A document must contain three Descriptive Metadata Sections (dmdSec). Each dmdSec element must have an ID attribute. The first
dmdSec element will contain a reference (mdref) to a bibliographic record in an external system for the printed version of
the newspaper. An mdRef element must have an ID attribute.
Example 3.
<mets:mets PROFILE="http://www.loc.gov/mets/profiles/00000010.xml">
<mets:dmdSec ID="dmd01">
<mets:mdRef xlink:href="path/to/record" LOCTYPE="URL" MDTYPE="MODS" ID="mods_print" />
</mets:dmdSec>
<mets:structMap>
<mets:div />
</mets:structMap>
</mets:mets>
dmdSec Requirement 2
The second dmdSec element will contain a reference (mdref) to a bibliographic record in an external system for the digital
version of the newspaper.
Example 4.
<mets:mets PROFILE="http://www.loc.gov/mets/profiles/00000010.xml">
<mets:dmdSec ID="dmd03">
<mets:mdRef xlink:href="path/to/record" LOCTYPE="URL" MDTYPE="MODS" ID="mods_digital" />
</mets:dmdSec>
<mets:structMap>
<mets:div />
</mets:structMap>
</mets:mets>
dmdSec Requirement 3
The third dmdSec contains a wrapped (mdWrap) MODS record for the digitized issue of the newspaper. The MODS record must contain
the title, issue date, genre (with value "newspaper") and language. The mods element must have an ID attribute.
Example 5.
<mets:mets PROFILE="http://www.loc.gov/mets/profiles/00000010.xml">
<mets:dmdSec ID="dmd03a">
<mets:mdWrap MDTYPE="MODS">
<mets:xmlData>
<mods:mods ID="DMD_issue_ex05">
<mods:titleInfo>
<mods:title>Montags Zeitung</mods:title>
</mods:titleInfo>
<mods:genre>newspaper</mods:genre>
<mods:originInfo>
<mods:dateIssued encoding="w3cdtf">11/05/1908</mods:dateIssued>
</mods:originInfo>
<mods:language>
<mods:languageTerm type="code" authority="rfc3066">de</mods:languageTerm>
</mods:language>
</mods:mods>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
<mets:structMap>
<mets:div />
</mets:structMap>
</mets:mets>
dmdSec Requirement 4
The MODS record may optionally describe the logical entities contained in a given issue of a newspaper. Logical entities include
the following: issue section (issueSection), article, article section (articleSection), photograph, illustration, and advertisement.
The logical entities are defined in Newspaper Genre Terms. [under construction]
Each logical entity will be described in the MODS record as a Related Item (relatedItem type="constituent"). Each relatedItem
type="constituent" element must have an ID attribute. Each relatedItem type="constituent" element must have a genre child
element with appropriate value (e.g. "article").
Example 6.
<mets:mets PROFILE="http://www.loc.gov/mets/profiles/00000010.xml">
<mets:dmdSec ID="dmd03b">
<mets:mdWrap MDTYPE="MODS">
<mets:xmlData>
<mods:mods ID="DMD_issue_ex06">
<mods:titleInfo>
<mods:title>Montags Zeitung</mods:title>
</mods:titleInfo>
<mods:genre>newspaper</mods:genre>
<mods:originInfo>
<mods:dateIssued encoding="w3cdtf">11/05/1908</mods:dateIssued>
</mods:originInfo>
<mods:language>
<mods:languageTerm type="code" authority="rfc3066">de</mods:languageTerm>
</mods:language>
<mods:relatedItem ID="DMD_article01_ex06" type="constituent">
<mods:titleInfo>
<mods:title>Wien, 10. mai.</mods:title>
</mods:titleInfo>
<mods:genre>article</mods:genre>
</mods:relatedItem>
</mods:mods>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
<mets:structMap>
<mets:div />
</mets:structMap>
</mets:mets>
dmdSec Requirement 5
Each logical entity (i.e. each relatedItem type="constituent" element) may in turn contain structural subparts. These subparts
include: paragraph. These subparts are expressed as part elements with type attribute of value "subPartType". The subpart
types are also defined in Newspaper Genre Terms. [under construction]. In the following example, the relatedItem element (for
an article) contains part child elements for each paragraphs in the article. Each part element must have an ID attribute.
Example 7.
<mets:mets PROFILE="http://www.loc.gov/mets/profiles/00000010.xml">
<mets:dmdSec ID="dmd03c">
<mets:mdWrap MDTYPE="MODS">
<mets:xmlData>
<mods:mods ID="DMD_issue_ex07">
<mods:titleInfo>
<mods:title>Montags Zeitung</mods:title>
</mods:titleInfo>
<mods:genre>newspaper</mods:genre>
<mods:originInfo>
<mods:dateIssued encoding="w3cdtf">11/05/1908</mods:dateIssued>
</mods:originInfo>
<mods:language>
<mods:languageTerm type="code" authority="rfc3066">de</mods:languageTerm>
</mods:language>
<mods:relatedItem ID="DMD_article01_ex07" type="constituent">
<mods:titleInfo>
<mods:title>Wien, 10. mai.</mods:title>
</mods:titleInfo>
<mods:genre>article</mods:genre>
<mods:part ID="DMD_article01_para01_ex07" type="paragraph">
<mods:text />
</mods:part>
<mods:part ID="DMD_article01_para02_ex07" type="paragraph">
<mods:text />
</mods:part>
<mods:part ID="DMD_article01_para03_ex07" type="paragraph">
<mods:text />
</mods:part>
</mods:relatedItem>
</mods:mods>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
<mets:structMap>
<mets:div />
</mets:structMap>
</mets:mets>
Administrative Metadata Section
amdSec Requirement 1
[Example of how to use PREMIS needs work]. A METS document may optionally contain preservation-related metadata that is expressed
using the PREMIS schemas. PREMIS data can be associated with either the entire object (i.e. data that is applicable to all
component files), which is, in PREMIS terms, considered to be at the level of the object category "representation", or PREMIS
data may associated with individual files. The following brief example shows a possible encoding for the date of a preservation
"event" (i.e. the date of digitization) for an audio file. Note that (as of this writing) PREMIS is still very new, and that
best practices for its use with METS have not yet emerged.
Example 8.
<mets:mets PROFILE="http://www.loc.gov/mets/profiles/00000010.xml">
<mets:structMap>
<mets:div />
</mets:structMap>
</mets:mets>
File Section
fileSec Requirement 1
The content files referenced by the fptr elements (in the structMap) must point to the appropriate files in the File Section
(fileSec). The profile makes no further requirements regarding the fileSec element. The example document in Appendix 1 provides
an example of Library of Congress practice.
Structure Map
structMap Requirement 1
The physical structure of the newspaper will be represented in the Structure Map (structMap) section of the METS document.
The document must have one and only one structMap. There are three physical entities that are expressed in the structMap.
They are: issue, page, and page region (pageRegion). These entities are expressed as a heirarchy of typed Division (div) elements
(div TYPE="news:issue", div TYPE="news:page", and div TYPE="news:pageRegion".
The structMap element must contain one and only one div TYPE="news:issue" child element. The div TYPE="news:issue" element
must have a DMDID attribute that points to the mods element for the issue. The div TYPE="news:issue" element will contain
one div TYPE="news:page" child element for each page in the newspaper issue. The following example shows an issue that consists
of four pages. Note that the div TYPE="news:page" elements do require a DMDID attribute as the physical entity "page" does
not directly correspond to a logical entity in the bibliographic description.
Example 9.
<mets:mets PROFILE="http://www.loc.gov/mets/profiles/00000010.xml">
<mets:dmdSec ID="dmd03d">
<mets:mdWrap MDTYPE="MODS">
<mets:xmlData>
<mods:mods ID="DMD_issue_ex09">
<mods:titleInfo>
<mods:title>Montags Zeitung</mods:title>
</mods:titleInfo>
<mods:genre>newspaper</mods:genre>
<mods:originInfo>
<mods:dateIssued encoding="w3cdtf">11/05/1908</mods:dateIssued>
</mods:originInfo>
<mods:language>
<mods:languageTerm type="code" authority="rfc3066">de</mods:languageTerm>
</mods:language>
</mods:mods>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
<mets:structMap>
<mets:div TYPE="news:issue" DMDID="DMD_issue_ex09">
<mets:div TYPE="news:page">
</mets:div>
<mets:div TYPE="news:page">
</mets:div>
<mets:div TYPE="news:page">
</mets:div>
<mets:div TYPE="news:page">
</mets:div>
</mets:div>
</mets:structMap>
</mets:mets>
structMap Requirement 2
Each div TYPE="news:page" element may contain a child div element for each form of digitized content that represents the page.
The three possibilities are: an image of the page (mets:div TYPE="news:image"), an alto file (mets:div TYPE="news:alto") for
the page, or a text version of the page (mets:div TYPE="news:text"). The following example shows two pages, each with a corresonding
image file and alto file. Note that the mets:div TYPE="news:image" elements and the mets:div TYPE="news:alto" elements each
contain File Pointer (fptr) elements that point to the corresponding file elements in the File Section (fileSec).
Example 10.
<mets:mets PROFILE="http://www.loc.gov/mets/profiles/00000010.xml">
<mets:dmdSec ID="dmd03e">
<mets:mdWrap MDTYPE="MODS">
<mets:xmlData>
<mods:mods ID="DMD_issue_ex10">
<mods:titleInfo>
<mods:title>Montags Zeitung</mods:title>
</mods:titleInfo>
<mods:genre>newspaper</mods:genre>
<mods:originInfo>
<mods:dateIssued encoding="w3cdtf">11/05/1908</mods:dateIssued>
</mods:originInfo>
<mods:language>
<mods:languageTerm type="code" authority="rfc3066">de</mods:languageTerm>
</mods:language>
</mods:mods>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
<mets:fileSec>
<mets:fileGrp>
<mets:file ID="IMG00001_ex10" MIMETYPE="image/tif">
<mets:FLocat LOCTYPE="URL" xlink:href="file://./issue0001-img/issue0001_0001.tif" />
</mets:file>
<mets:file ID="IMG00002_ex10" MIMETYPE="image/tif">
<mets:FLocat LOCTYPE="URL" xlink:href="file://./issue0001-img/issue0001_0002.tif" />
</mets:file>
</mets:fileGrp>
<mets:fileGrp>
<mets:file ID="ALT00001_ex10" MIMETYPE="text/xml">
<mets:FLocat LOCTYPE="URL" xlink:href="file://./issue0001-alto/issue0001-alto00001.xml" />
</mets:file>
<mets:file ID="ALT00002_ex10" MIMETYPE="text/xml">
<mets:FLocat LOCTYPE="URL" xlink:href="file://./issue0001-alto/issue0001-alto00002.xml" />
</mets:file>
</mets:fileGrp>
</mets:fileSec>
<mets:structMap>
<mets:div TYPE="news:issue" DMDID="DMD_issue_ex10">
<mets:div TYPE="news:page">
<mets:div TYPE="news:image">
<mets:fptr FILEID="IMG00001_ex10" />
</mets:div>
<mets:div TYPE="news:alto">
<mets:fptr FILEID="ALT00001_ex10" />
</mets:div>
</mets:div>
<mets:div TYPE="news:page">
<mets:div TYPE="news:image">
<mets:fptr FILEID="IMG00002_ex10" />
</mets:div>
<mets:div TYPE="news:alto">
<mets:fptr FILEID="ALT00002_ex10" />
</mets:div>
</mets:div>
</mets:div>
</mets:structMap>
</mets:mets>
structMap Requirement 3
A div TYPE="news:page" element may contain one or more div TYPE="news:pageRegion elements. A "pageRegion" is a portion of
a page file that corresponds to a particular logical entity (e.g. a newspaper article). In the example below the issue has
two pages with three articles. Each div TYPE="news:pageRegion element must contain a mets:div TYPE="news:alto" child element
with an fptr element. The fptr element must contain an area child element with FILEID and BEGIN attributes that point to the
corresponding alto file and the ID attribute within the alto file that identifies the pageRegion.
Example 11.
<mets:mets PROFILE="http://www.loc.gov/mets/profiles/00000010.xml">
<mets:dmdSec ID="dmd03f">
<mets:mdWrap MDTYPE="MODS">
<mets:xmlData>
<mods:mods ID="DMD_issue_ex11">
<mods:titleInfo>
<mods:title>Montags Zeitung</mods:title>
</mods:titleInfo>
<mods:genre>newspaper</mods:genre>
<mods:originInfo>
<mods:dateIssued encoding="w3cdtf">11/05/1908</mods:dateIssued>
</mods:originInfo>
<mods:language>
<mods:languageTerm type="code" authority="rfc3066">de</mods:languageTerm>
</mods:language>
<mods:relatedItem ID="DMD_article01_ex11" type="constituent">
<mods:titleInfo>
<mods:title>Wien, 10. mai.</mods:title>
</mods:titleInfo>
<mods:genre>article</mods:genre>
</mods:relatedItem>
<mods:relatedItem ID="DMD_article02_ex11" type="constituent">
<mods:titleInfo>
<mods:title>Neueste nachrichten</mods:title>
</mods:titleInfo>
<mods:genre>article</mods:genre>
</mods:relatedItem>
<mods:relatedItem ID="DMD_article03_ex11" type="constituent">
<mods:titleInfo>
<mods:title>Neueste nachrichten</mods:title>
</mods:titleInfo>
<mods:genre>article</mods:genre>
</mods:relatedItem>
</mods:mods>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
<mets:fileSec>
<mets:fileGrp>
<mets:file ID="IMG00001_ex11" MIMETYPE="image/tif">
<mets:FLocat LOCTYPE="URL" xlink:href="file://./issue0001-img/issue0001_0001.tif" />
</mets:file>
<mets:file ID="IMG00002_ex11" MIMETYPE="image/tif">
<mets:FLocat LOCTYPE="URL" xlink:href="file://./issue0001-img/issue0001_0002.tif" />
</mets:file>
</mets:fileGrp>
<mets:fileGrp>
<mets:file ID="ALT00001_ex11" MIMETYPE="text/xml">
<mets:FLocat LOCTYPE="URL" xlink:href="file://./issue0001-alto/issue0001-alto00001.xml" />
</mets:file>
<mets:file ID="ALT00002_ex11" MIMETYPE="text/xml">
<mets:FLocat LOCTYPE="URL" xlink:href="file://./issue0001-alto/issue0001-alto00002.xml" />
</mets:file>
</mets:fileGrp>
</mets:fileSec>
<mets:structMap>
<mets:div TYPE="news:issue" DMDID="DMD_article01_ex11">
<mets:div TYPE="news:page">
<!---->
<mets:div TYPE="news:pageRegion" DMDID="DMD_article01_ex11">
<mets:div TYPE="news:alto">
<mets:fptr>
<mets:area FILEID="ALT00001_ex11" BEGIN="P1_TB00005" />
</mets:fptr>
</mets:div>
</mets:div>
</mets:div>
<mets:div TYPE="news:page">
<!---->
<mets:div TYPE="news:pageRegion" DMDID="DMD_article02_ex11">
<mets:div TYPE="news:alto">
<mets:fptr>
<mets:area FILEID="ALT00002_ex11" BEGIN="P2_TB00018" />
</mets:fptr>
</mets:div>
</mets:div>
<!---->
<mets:div TYPE="news:pageRegion" DMDID="DMD_article03_ex11">
<mets:div TYPE="news:alto">
<mets:fptr>
<mets:area FILEID="ALT00002_ex11" BEGIN="P2_TB00024" />
</mets:fptr>
</mets:div>
</mets:div>
</mets:div>
</mets:div>
</mets:structMap>
</mets:mets>
structMap Requirement 4
It is possible, using div TYPE="news:pageRegion elements to provide any level of access to the contents of a particular newpapaer
page. The previous example provides what might be called "article level" access. The following example provides what might
be called "article and paragraph level access".
Example 12.
<mets:mets PROFILE="http://www.loc.gov/mets/profiles/00000010.xml">
<mets:dmdSec ID="dmd03g">
<mets:mdWrap MDTYPE="MODS">
<mets:xmlData>
<mods:mods ID="DMD_issue_ex12">
<mods:titleInfo>
<mods:title>Montags Zeitung</mods:title>
</mods:titleInfo>
<mods:genre>newspaper</mods:genre>
<mods:originInfo>
<mods:dateIssued encoding="w3cdtf">11/05/1908</mods:dateIssued>
</mods:originInfo>
<mods:language>
<mods:languageTerm type="code" authority="rfc3066">de</mods:languageTerm>
</mods:language>
<mods:relatedItem ID="DMD_article01_ex12" type="constituent">
<mods:titleInfo>
<mods:title>Wien, 10. mai.</mods:title>
</mods:titleInfo>
<mods:genre>article</mods:genre>
<mods:part ID="DMD_article01_para01_ex12" type="paragraph">
<mods:text />
</mods:part>
<mods:part ID="DMD_article01_para02_ex12" type="paragraph">
<mods:text />
</mods:part>
<mods:part ID="DMD_article01_para03_ex12" type="paragraph">
<mods:text />
</mods:part>
</mods:relatedItem>
</mods:mods>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
<mets:structMap>
<mets:div TYPE="news:issue" DMDID="DMD_issue_ex12">
<mets:div TYPE="news:page">
<mets:div TYPE="news:pageRegion" DMDID="DMD_article01_ex12">
</mets:div>
<mets:div TYPE="news:pageRegion" DMDID="DMD_article01_para01_ex12">
</mets:div>
<mets:div TYPE="news:pageRegion" DMDID="DMD_article01_para01_ex12">
</mets:div>
<mets:div TYPE="news:pageRegion" DMDID="DMD_article01_para01_ex12">
</mets:div>
</mets:div>
</mets:div>
</mets:structMap>
</mets:mets>
Technical Requirements
Content Files
Still Image Files
The master image files referenced by conforming documents must be in TIFF Revision 6.0 format. See Digital Formats for Library
of Congress Collections: TIFF, Revision 6.0 at http://www.digitalpreservation.gov/formats/fdd/fdd000022.shtml.
Text Files
Master text files referenced by conforming documents must be in XML format. See Digital Formats for Library of Congress Collections:
XML (Extensible Markup Language) at http://www.digitalpreservation.gov/formats/fdd/fdd000075.shtml.