LC METS Profile

METS Profile for Historical Newspapers [Draft]


XML Version of Profile Document



Abstract

The METS Profile for Historical Newspapers specifies how METS documents representing digitized historical newpapers should be encoded. Note that the profile is to be used to represent a single issue of a newspaper. The profile uses MODS to express the logical structure of a newspaper issue, and uses the METS structMap to express the physical structure of the newspaper issue.


URI

//www.loc.gov/standards/mets/profiles/LC_profiles/newspaper/00000010.xml


Date

2006-02-06T17:31:00

Contact

Morgan V. Cundiff
Network Development and MARC Standards Office, Library of Congress
101 Independence Avenue, Washington DC, 20540


Related Profile

No related profiles.


Extension Schema

Metadata Object Description Schema (MODS)
//www.loc.gov/mods/v3/


Extension Schema

Metadata for Images in XML (MIX)
//www.loc.gov/mix/


Extension Schema

Premis
//www.loc.gov/standards/premis


Description Rules

It is recommended that the content data used in the MODS section of a METS document conform to AACR2 to the extent that it applies.


Controlled Vocabularies

Library of Congress Subject Headings
Library of Congress
No URI established.
Controlled subject headings used in the MODS section of a METS document must be formulated according to the Library of Congress Subject Headings (LCSH).

Library of Congress Classification
Library of Congress
//www.loc.gov/catdir/cpso/lcc.html

NACO Authority File
Library of Congress
http://authorities.loc.gov/
Controlled name and title headings used in the MODS section of a METS document must be formulated according to the NACO Authority File.

MARC Country Codes
Library of Congress
//www.loc.gov/marc/countries/
Country codes used in the MODS section of a METS document must use MARC country code list.

ISO 639-2 Language Codes
Library of Congress
//www.loc.gov/standards/iso639-2/
Language codes used in the MODS section of a METS document must use ISO 639-2 bibliographic codes.

MARC Relator Codes
Library of Congress
//www.loc.gov/marc/relators/
Relator codes or terms used in the MODS section of a METS document must use MARC relator list.

Target Audience Codes
Library of Congress
//www.loc.gov/marc/sourcecode/target/
Terms used in the MODS section of a METS document must use the MARC target audience list.

MARC Code List for Organizations
Library of Congress
//www.loc.gov/marc/organizations/
Source codes used in the MODS section of a METS document must use MARC organization list.

Structural Requirements

METS Root Element
metsRootElement Requirement 1

The METS root element must have an attribute PROFILE="//www.loc.gov/mets/profiles/00000010.xml".

Example 1.

    
<mets:mets PROFILE="//www.loc.gov/mets/profiles/00000010.xml">
      
<mets:structMap>
        
<mets:div />
      
</mets:structMap>
    
</mets:mets>
  


Descriptive Metadata Section
dmdSec Requirement 1

A document must contain three Descriptive Metadata Sections (dmdSec). Each dmdSec element must have an ID attribute. The first dmdSec element will contain a reference (mdref) to a bibliographic record in an external system for the printed version of the newspaper. An mdRef element must have an ID attribute.

Example 3.

    
<mets:mets PROFILE="//www.loc.gov/mets/profiles/00000010.xml">
      
<mets:dmdSec ID="dmd01">
        
<mets:mdRef xlink:href="path/to/record" LOCTYPE="URL" MDTYPE="MODS" ID="mods_print" />
      
</mets:dmdSec>
      
<mets:structMap>
        
<mets:div />
      
</mets:structMap>
    
</mets:mets>
  


dmdSec Requirement 2

The second dmdSec element will contain a reference (mdref) to a bibliographic record in an external system for the digital version of the newspaper.

Example 4.

    
<mets:mets PROFILE="//www.loc.gov/mets/profiles/00000010.xml">
      
<mets:dmdSec ID="dmd03">
        
<mets:mdRef xlink:href="path/to/record" LOCTYPE="URL" MDTYPE="MODS" ID="mods_digital" />
      
</mets:dmdSec>
      
<mets:structMap>
        
<mets:div />
      
</mets:structMap>
    
</mets:mets>
  


dmdSec Requirement 3

The third dmdSec contains a wrapped (mdWrap) MODS record for the digitized issue of the newspaper. The MODS record must contain the title, issue date, genre (with value "newspaper") and language. The mods element must have an ID attribute.

Example 5.

    
<mets:mets PROFILE="//www.loc.gov/mets/profiles/00000010.xml">
      
<mets:dmdSec ID="dmd03a">
        
<mets:mdWrap MDTYPE="MODS">
          
<mets:xmlData>
            
<mods:mods ID="DMD_issue_ex05">
              
<mods:titleInfo>
                
<mods:title>Montags Zeitung</mods:title>
              
</mods:titleInfo>
              
<mods:genre>newspaper</mods:genre>
              
<mods:originInfo>
                
<mods:dateIssued encoding="w3cdtf">11/05/1908</mods:dateIssued>
              
</mods:originInfo>
              
<mods:language>
                
<mods:languageTerm type="code" authority="rfc3066">de</mods:languageTerm>
              
</mods:language>
            
</mods:mods>
          
</mets:xmlData>
        
</mets:mdWrap>
      
</mets:dmdSec>
      
<mets:structMap>
        
<mets:div />
      
</mets:structMap>
    
</mets:mets>
  


dmdSec Requirement 4

The MODS record may optionally describe the logical entities contained in a given issue of a newspaper. Logical entities include the following: issue section (issueSection), article, article section (articleSection), photograph, illustration, and advertisement. The logical entities are defined in Newspaper Genre Terms. [under construction]

Each logical entity will be described in the MODS record as a Related Item (relatedItem type="constituent"). Each relatedItem type="constituent" element must have an ID attribute. Each relatedItem type="constituent" element must have a genre child element with appropriate value (e.g. "article").

Example 6.

    
<mets:mets PROFILE="//www.loc.gov/mets/profiles/00000010.xml">
      
<mets:dmdSec ID="dmd03b">
        
<mets:mdWrap MDTYPE="MODS">
          
<mets:xmlData>
            
<mods:mods ID="DMD_issue_ex06">
              
<mods:titleInfo>
                
<mods:title>Montags Zeitung</mods:title>
              
</mods:titleInfo>
              
<mods:genre>newspaper</mods:genre>
              
<mods:originInfo>
                
<mods:dateIssued encoding="w3cdtf">11/05/1908</mods:dateIssued>
              
</mods:originInfo>
              
<mods:language>
                
<mods:languageTerm type="code" authority="rfc3066">de</mods:languageTerm>
              
</mods:language>
              
<mods:relatedItem ID="DMD_article01_ex06" type="constituent">
                
<mods:titleInfo>
                  
<mods:title>Wien, 10. mai.</mods:title>
                
</mods:titleInfo>
                
<mods:genre>article</mods:genre>
              
</mods:relatedItem>
            
</mods:mods>
          
</mets:xmlData>
        
</mets:mdWrap>
      
</mets:dmdSec>
      
<mets:structMap>
        
<mets:div />
      
</mets:structMap>
    
</mets:mets>
  


dmdSec Requirement 5

Each logical entity (i.e. each relatedItem type="constituent" element) may in turn contain structural subparts. These subparts include: paragraph. These subparts are expressed as part elements with type attribute of value "subPartType". The subpart types are also defined in Newspaper Genre Terms. [under construction]. In the following example, the relatedItem element (for an article) contains part child elements for each paragraphs in the article. Each part element must have an ID attribute.

Example 7.

    
<mets:mets PROFILE="//www.loc.gov/mets/profiles/00000010.xml">
      
<mets:dmdSec ID="dmd03c">
        
<mets:mdWrap MDTYPE="MODS">
          
<mets:xmlData>
            
<mods:mods ID="DMD_issue_ex07">
              
<mods:titleInfo>
                
<mods:title>Montags Zeitung</mods:title>
              
</mods:titleInfo>
              
<mods:genre>newspaper</mods:genre>
              
<mods:originInfo>
                
<mods:dateIssued encoding="w3cdtf">11/05/1908</mods:dateIssued>
              
</mods:originInfo>
              
<mods:language>
                
<mods:languageTerm type="code" authority="rfc3066">de</mods:languageTerm>
              
</mods:language>
              
<mods:relatedItem ID="DMD_article01_ex07" type="constituent">
                
<mods:titleInfo>
                  
<mods:title>Wien, 10. mai.</mods:title>
                
</mods:titleInfo>
                
<mods:genre>article</mods:genre>
                
<mods:part ID="DMD_article01_para01_ex07" type="paragraph">
                  
<mods:text />
                
</mods:part>
                
<mods:part ID="DMD_article01_para02_ex07" type="paragraph">
                  
<mods:text />
                
</mods:part>
                
<mods:part ID="DMD_article01_para03_ex07" type="paragraph">
                  
<mods:text />
                
</mods:part>
              
</mods:relatedItem>
            
</mods:mods>
          
</mets:xmlData>
        
</mets:mdWrap>
      
</mets:dmdSec>
      
<mets:structMap>
        
<mets:div />
      
</mets:structMap>
    
</mets:mets>
  


Administrative Metadata Section
amdSec Requirement 1

[Example of how to use PREMIS needs work]. A METS document may optionally contain preservation-related metadata that is expressed using the PREMIS schemas. PREMIS data can be associated with either the entire object (i.e. data that is applicable to all component files), which is, in PREMIS terms, considered to be at the level of the object category "representation", or PREMIS data may associated with individual files. The following brief example shows a possible encoding for the date of a preservation "event" (i.e. the date of digitization) for an audio file. Note that (as of this writing) PREMIS is still very new, and that best practices for its use with METS have not yet emerged.

Example 8.

    
<mets:mets PROFILE="//www.loc.gov/mets/profiles/00000010.xml">
      
<mets:structMap>
        
<mets:div />
      
</mets:structMap>
    
</mets:mets>
  


File Section
fileSec Requirement 1

The content files referenced by the fptr elements (in the structMap) must point to the appropriate files in the File Section (fileSec). The profile makes no further requirements regarding the fileSec element. The example document in Appendix 1 provides an example of Library of Congress practice.

Structure Map
structMap Requirement 1

The physical structure of the newspaper will be represented in the Structure Map (structMap) section of the METS document. The document must have one and only one structMap. There are three physical entities that are expressed in the structMap. They are: issue, page, and page region (pageRegion). These entities are expressed as a heirarchy of typed Division (div) elements (div TYPE="news:issue", div TYPE="news:page", and div TYPE="news:pageRegion".

The structMap element must contain one and only one div TYPE="news:issue" child element. The div TYPE="news:issue" element must have a DMDID attribute that points to the mods element for the issue. The div TYPE="news:issue" element will contain one div TYPE="news:page" child element for each page in the newspaper issue. The following example shows an issue that consists of four pages. Note that the div TYPE="news:page" elements do require a DMDID attribute as the physical entity "page" does not directly correspond to a logical entity in the bibliographic description.

Example 9.

    
<mets:mets PROFILE="//www.loc.gov/mets/profiles/00000010.xml">
      
<mets:dmdSec ID="dmd03d">
        
<mets:mdWrap MDTYPE="MODS">
          
<mets:xmlData>
            
<mods:mods ID="DMD_issue_ex09">
              
<mods:titleInfo>
                
<mods:title>Montags Zeitung</mods:title>
              
</mods:titleInfo>
              
<mods:genre>newspaper</mods:genre>
              
<mods:originInfo>
                
<mods:dateIssued encoding="w3cdtf">11/05/1908</mods:dateIssued>
              
</mods:originInfo>
              
<mods:language>
                
<mods:languageTerm type="code" authority="rfc3066">de</mods:languageTerm>
              
</mods:language>
            
</mods:mods>
          
</mets:xmlData>
        
</mets:mdWrap>
      
</mets:dmdSec>
      
<mets:structMap>
        
<mets:div TYPE="news:issue" DMDID="DMD_issue_ex09">
          
<mets:div TYPE="news:page">
          
</mets:div>
          
<mets:div TYPE="news:page">
          
</mets:div>
          
<mets:div TYPE="news:page">
          
</mets:div>
          
<mets:div TYPE="news:page">
          
</mets:div>
        
</mets:div>
      
</mets:structMap>
    
</mets:mets>
  


structMap Requirement 2

Each div TYPE="news:page" element may contain a child div element for each form of digitized content that represents the page. The three possibilities are: an image of the page (mets:div TYPE="news:image"), an alto file (mets:div TYPE="news:alto") for the page, or a text version of the page (mets:div TYPE="news:text"). The following example shows two pages, each with a corresonding image file and alto file. Note that the mets:div TYPE="news:image" elements and the mets:div TYPE="news:alto" elements each contain File Pointer (fptr) elements that point to the corresponding file elements in the File Section (fileSec).

Example 10.

    
<mets:mets PROFILE="//www.loc.gov/mets/profiles/00000010.xml">
      
<mets:dmdSec ID="dmd03e">
        
<mets:mdWrap MDTYPE="MODS">
          
<mets:xmlData>
            
<mods:mods ID="DMD_issue_ex10">
              
<mods:titleInfo>
                
<mods:title>Montags Zeitung</mods:title>
              
</mods:titleInfo>
              
<mods:genre>newspaper</mods:genre>
              
<mods:originInfo>
                
<mods:dateIssued encoding="w3cdtf">11/05/1908</mods:dateIssued>
              
</mods:originInfo>
              
<mods:language>
                
<mods:languageTerm type="code" authority="rfc3066">de</mods:languageTerm>
              
</mods:language>
            
</mods:mods>
          
</mets:xmlData>
        
</mets:mdWrap>
      
</mets:dmdSec>

      
<mets:fileSec>
        
<mets:fileGrp>
          
<mets:file ID="IMG00001_ex10" MIMETYPE="image/tif">
            
<mets:FLocat LOCTYPE="URL" xlink:href="file://./issue0001-img/issue0001_0001.tif" />
          
</mets:file>
          
<mets:file ID="IMG00002_ex10" MIMETYPE="image/tif">
            
<mets:FLocat LOCTYPE="URL" xlink:href="file://./issue0001-img/issue0001_0002.tif" />
          
</mets:file>
        
</mets:fileGrp>

        
<mets:fileGrp>
          
<mets:file ID="ALT00001_ex10" MIMETYPE="text/xml">
            
<mets:FLocat LOCTYPE="URL" xlink:href="file://./issue0001-alto/issue0001-alto00001.xml" />
          
</mets:file>
          
<mets:file ID="ALT00002_ex10" MIMETYPE="text/xml">
            
<mets:FLocat LOCTYPE="URL" xlink:href="file://./issue0001-alto/issue0001-alto00002.xml" />
          
</mets:file>
        
</mets:fileGrp>
      
</mets:fileSec>

      
<mets:structMap>
        
<mets:div TYPE="news:issue" DMDID="DMD_issue_ex10">
          
<mets:div TYPE="news:page">
            
<mets:div TYPE="news:image">
              
<mets:fptr FILEID="IMG00001_ex10" />
            
</mets:div>
            
<mets:div TYPE="news:alto">
              
<mets:fptr FILEID="ALT00001_ex10" />
            
</mets:div>
          
</mets:div>
          
<mets:div TYPE="news:page">
            
<mets:div TYPE="news:image">
              
<mets:fptr FILEID="IMG00002_ex10" />
            
</mets:div>
            
<mets:div TYPE="news:alto">
              
<mets:fptr FILEID="ALT00002_ex10" />
            
</mets:div>
          
</mets:div>
        
</mets:div>
      
</mets:structMap>
    
</mets:mets>
  


structMap Requirement 3

A div TYPE="news:page" element may contain one or more div TYPE="news:pageRegion elements. A "pageRegion" is a portion of a page file that corresponds to a particular logical entity (e.g. a newspaper article). In the example below the issue has two pages with three articles. Each div TYPE="news:pageRegion element must contain a mets:div TYPE="news:alto" child element with an fptr element. The fptr element must contain an area child element with FILEID and BEGIN attributes that point to the corresponding alto file and the ID attribute within the alto file that identifies the pageRegion.

Example 11.

    
<mets:mets PROFILE="//www.loc.gov/mets/profiles/00000010.xml">
      
<mets:dmdSec ID="dmd03f">
        
<mets:mdWrap MDTYPE="MODS">
          
<mets:xmlData>
            
<mods:mods ID="DMD_issue_ex11">
              
<mods:titleInfo>
                
<mods:title>Montags Zeitung</mods:title>
              
</mods:titleInfo>
              
<mods:genre>newspaper</mods:genre>
              
<mods:originInfo>
                
<mods:dateIssued encoding="w3cdtf">11/05/1908</mods:dateIssued>
              
</mods:originInfo>
              
<mods:language>
                
<mods:languageTerm type="code" authority="rfc3066">de</mods:languageTerm>
              
</mods:language>
              
<mods:relatedItem ID="DMD_article01_ex11" type="constituent">
                
<mods:titleInfo>
                  
<mods:title>Wien, 10. mai.</mods:title>
                
</mods:titleInfo>
                
<mods:genre>article</mods:genre>
              
</mods:relatedItem>
              
<mods:relatedItem ID="DMD_article02_ex11" type="constituent">
                
<mods:titleInfo>
                  
<mods:title>Neueste nachrichten</mods:title>
                
</mods:titleInfo>
                
<mods:genre>article</mods:genre>
              
</mods:relatedItem>
              
<mods:relatedItem ID="DMD_article03_ex11" type="constituent">
                
<mods:titleInfo>
                  
<mods:title>Neueste nachrichten</mods:title>
                
</mods:titleInfo>
                
<mods:genre>article</mods:genre>
              
</mods:relatedItem>
            
</mods:mods>
          
</mets:xmlData>
        
</mets:mdWrap>
      
</mets:dmdSec>

      
<mets:fileSec>
        
<mets:fileGrp>
          
<mets:file ID="IMG00001_ex11" MIMETYPE="image/tif">
            
<mets:FLocat LOCTYPE="URL" xlink:href="file://./issue0001-img/issue0001_0001.tif" />
          
</mets:file>
          
<mets:file ID="IMG00002_ex11" MIMETYPE="image/tif">
            
<mets:FLocat LOCTYPE="URL" xlink:href="file://./issue0001-img/issue0001_0002.tif" />
          
</mets:file>
        
</mets:fileGrp>

        
<mets:fileGrp>
          
<mets:file ID="ALT00001_ex11" MIMETYPE="text/xml">
            
<mets:FLocat LOCTYPE="URL" xlink:href="file://./issue0001-alto/issue0001-alto00001.xml" />
          
</mets:file>
          
<mets:file ID="ALT00002_ex11" MIMETYPE="text/xml">
            
<mets:FLocat LOCTYPE="URL" xlink:href="file://./issue0001-alto/issue0001-alto00002.xml" />
          
</mets:file>
        
</mets:fileGrp>
      
</mets:fileSec>

      
<mets:structMap>
        
<mets:div TYPE="news:issue" DMDID="DMD_article01_ex11">
          
<mets:div TYPE="news:page">
            
<!-- article 1 title: "Wein, 10. Mai."  -->
            
<mets:div TYPE="news:pageRegion" DMDID="DMD_article01_ex11">
              
<mets:div TYPE="news:alto">
                
<mets:fptr>
                  
<mets:area FILEID="ALT00001_ex11" BEGIN="P1_TB00005" />
                
</mets:fptr>
              
</mets:div>
            
</mets:div>
          
</mets:div>
          
<mets:div TYPE="news:page">
            
<!-- article 2 title: "Neueste Nachrichten."  -->
            
<mets:div TYPE="news:pageRegion" DMDID="DMD_article02_ex11">
              
<mets:div TYPE="news:alto">
                
<mets:fptr>
                  
<mets:area FILEID="ALT00002_ex11" BEGIN="P2_TB00018" />
                
</mets:fptr>
              
</mets:div>
            
</mets:div>
            
<!-- article 3 title: "Notizen."  -->
            
<mets:div TYPE="news:pageRegion" DMDID="DMD_article03_ex11">
              
<mets:div TYPE="news:alto">
                
<mets:fptr>
                  
<mets:area FILEID="ALT00002_ex11" BEGIN="P2_TB00024" />
                
</mets:fptr>
              
</mets:div>
            
</mets:div>
          
</mets:div>
        
</mets:div>
      
</mets:structMap>
    
</mets:mets>
  


structMap Requirement 4

It is possible, using div TYPE="news:pageRegion elements to provide any level of access to the contents of a particular newpapaer page. The previous example provides what might be called "article level" access. The following example provides what might be called "article and paragraph level access".

Example 12.

    
<mets:mets PROFILE="//www.loc.gov/mets/profiles/00000010.xml">
      
<mets:dmdSec ID="dmd03g">
        
<mets:mdWrap MDTYPE="MODS">
          
<mets:xmlData>
            
<mods:mods ID="DMD_issue_ex12">
              
<mods:titleInfo>
                
<mods:title>Montags Zeitung</mods:title>
              
</mods:titleInfo>
              
<mods:genre>newspaper</mods:genre>
              
<mods:originInfo>
                
<mods:dateIssued encoding="w3cdtf">11/05/1908</mods:dateIssued>
              
</mods:originInfo>
              
<mods:language>
                
<mods:languageTerm type="code" authority="rfc3066">de</mods:languageTerm>
              
</mods:language>
              
<mods:relatedItem ID="DMD_article01_ex12" type="constituent">
                
<mods:titleInfo>
                  
<mods:title>Wien, 10. mai.</mods:title>
                
</mods:titleInfo>
                
<mods:genre>article</mods:genre>
                
<mods:part ID="DMD_article01_para01_ex12" type="paragraph">
                  
<mods:text />
                
</mods:part>
                
<mods:part ID="DMD_article01_para02_ex12" type="paragraph">
                  
<mods:text />
                
</mods:part>
                
<mods:part ID="DMD_article01_para03_ex12" type="paragraph">
                  
<mods:text />
                
</mods:part>
              
</mods:relatedItem>
            
</mods:mods>
          
</mets:xmlData>
        
</mets:mdWrap>
      
</mets:dmdSec>
      
<mets:structMap>
        
<mets:div TYPE="news:issue" DMDID="DMD_issue_ex12">
          
<mets:div TYPE="news:page">
            
<mets:div TYPE="news:pageRegion" DMDID="DMD_article01_ex12">
            
</mets:div>
            
<mets:div TYPE="news:pageRegion" DMDID="DMD_article01_para01_ex12">
            
</mets:div>
            
<mets:div TYPE="news:pageRegion" DMDID="DMD_article01_para01_ex12">
            
</mets:div>
            
<mets:div TYPE="news:pageRegion" DMDID="DMD_article01_para01_ex12">
            
</mets:div>
          
</mets:div>
        
</mets:div>
      
</mets:structMap>
    
</mets:mets>
  


Technical Requirements

Content Files

Still Image Files

The master image files referenced by conforming documents must be in TIFF Revision 6.0 format. See Digital Formats for Library of Congress Collections: TIFF, Revision 6.0 at http://www.digitalpreservation.gov/formats/fdd/fdd000022.shtml.

Text Files

Master text files referenced by conforming documents must be in XML format. See Digital Formats for Library of Congress Collections: XML (Extensible Markup Language) at http://www.digitalpreservation.gov/formats/fdd/fdd000075.shtml.