Skip Navigation Links The Library of Congress >> Standards
Metadata Encoding and Transmission Standard (METS) Official Web Site
METS_Profile: @xsi:schemaLocation="http://www.loc.gov/METS_Profile/ http://www.loc.gov/standards/mets/profile_docs/mets.profile.v1-2.xsd"
title:
RLUK 19th Century Pamphlets METS Profile
abstract:
This profile specifies the use of METS to provide metadata for the project to digitise 19th Century Pamphlets under Phase 2 of the JISC Digitisation Programme. Further information about the project can be found at: http://www.southampton.ac.uk/library/bopcris/ http://www.jisc.ac.uk/whatwedo/programmes/programme_digitisation/pamphlets.aspx The materials to be digitised comprise single pamphlets, sometimes bound into a volume, sometimes not. The item level is defined to be a pamphlet. One METS document should be present for each digitised pamphlet. If the pamphlets were bound into a volume, the front and back matter, including tables of contents, of the volume have also been scanned. In such cases, one METS document should also be present for each volume, containing the images of the volume and references to the METS documents for the pamphlets contained within the volume. The digital files for each pamphlet should comprise: master images (TIFF) and OCR full-text in two formats (plain text and word co-ordinated XML). The digital files for each bound volume should comprise master images only. All digital files are packaged together with the METS and extension metadata into a single digital object (TAR) for delivery and storage.
date:
2008-08-06T00:00:00
contact:
name:
Ed Fay
institution:
University of Southampton
address:
BOPCRIS Digitisation Centre, Hartley Library, University of Southampton, SO17 1BJ, UK
phone:
+44 (0) 23 8059 3575
email:
E.Fay@soton.ac.uk
extension_schema: @ID="MODS"
name:
Metadata Object Description Schema (MODS)
context:
mets/dmdSec/mdWrap/xmlData
note:
Bibliographic metadata at the item level, supplied in MODS by COPAC (http://www.copac.ac.uk) There will be one instance per pamphlet, and none per bound volume. This metadata will be contained in a dmdSec linked to the top level div in the logical structMap. MODS metadata will conform to version 3.2 of the schema.
extension_schema: @ID="MIX"
name:
NISO Metadata for Images in XML (NISO MIX)
context:
mets/amdSec/techMD/mdWrap/xmlData
note:
Technical metadata at the file level, extracted from standard file information and TIFF headers. There will be one instance per master image file listed in the fileSec. This metadata will be contained in an amdSec linked to the relevant file element. MIX metadata will conform to version 2.0 of the schema.
extension_schema: @ID="PREMIS"
name:
PREMIS Data Dictionary for Preservation Metadata
context:
mets/amdSec/digiProvMD|techMD/mdWrap/xmlData
note:
Technical metadata at the file level, extracted from standard file information. There will be one instance per file of any kind listed in the fileSec. This metadata will be contained in an amdSec linked to the relevant file element. The PREMIS components will be contained within seperate METS elements, as suggested by the "Guidelines for using PREMIS within METS" at http://www.loc.gov/standards/premis/premis-mets.html techMD: premis:object for each file digiProvMD: premis:event containing information about file derivations digiProvMD: premis:agent containing information about software packages PREMIS metadata will conform to version 2.0 of the schema.
extension_schema: @ID="descriptors"
name:
BOPCRIS Descriptors Metadata
context:
mets/dmdSec/mdWrap/xmlData
note:
Descriptive metadata at the page level, recorded by scanner operators at the point of scanning. This schema indicates attributes of a page, for example that it contains text in a certain language, imagery or tabular data. There are 12 such attributes enumerated within the schema. There will be one instance per item page, indicating language as a minimum. Language information uses the controlled vocabulary ISO 639.2. This metadata will be contained in a dmdSec linked to the page level div in the logical structMap. BOPCRIS Descriptors metadata will confrom to version 0.2 of the schema.
extension_schema: @ID="provenance"
name:
BOPCRIS Provenance Metadata
context:
mets/amdSec/digiProvMD/mdWrap/xmlData
note:
Administrative metadata at the object level, indicating the provenance of the item. This schema contains information about the source library, collection and shelfmark of the original item. There will be one instance per pamphlet, none per bound volume. This metadata will be contained in an amdSec linked to the top level div in the logical structMap. BOPCRIS Provenance metadata will conform to version 0.1 of the schema.
extension_schema: @ID="rights"
name:
BOPCRIS Rights Metadata
context:
mets/amdSec/rightsMD/mdWrap/xmlData
note:
Administrative metadata at the item level, indicating the copyright status of the item. For a full explanation of the copyright status, see the project documentation. There will be one instance per pamphlet, none per bound volume. This metadata will be contained in an amdSec linked to the top level div in the logical structMap. BOPCRIS Rights metadata will conform to version 0.1 of the schema.
description_rules:

Bibliographic records will conform to the descriptive specifications of COPAC at the time of export (2007).

For further information consult COPAC: http://www.copac.ac.uk/

controlled_vocabularies:
vocabulary: @ID="ISO_639_2"
name:
ISO 639.2 Codes for the representation of names of languages-- Part 2: alpha-3 code
maintenance_agency:
Library of Congress
context:

BOPCRIS Descriptors metadata uses ISO 639.2 language codes to indicate the presence of text in a certain language on a page.

structural_requirements:
metsRootElement:
requirement: @ID="OBJID"

OBJID must contain a project specific ID.

BOPCRIS generated a unique ID for each pamphlet based on: the local holding ID from the bibliographic record, library identifier, volume identifier (if applicable, otherwise the default value 000 was used), and another number to indicate whether the pamphlet is a duplicate item.

Bound volume identifiers are generated from library identifer and volume identifier only.

Library identifiers are identical to the institution codes used to identify libraries within the COPAC database. Volume identifiers are derived from the information about the volume available within the bibliographic record, in most instances, or from another source, such as a barcode, where necessary.

Pamphlet ID form: holdingID_library-volume-duplicate Examples: 1234567890_tst-000-1 27000123456_bri-1800-2 19B39982X_liv-531-1

Bound volume ID form: library-volume Examples: liv-531 ucl-A70

metsHdr:
requirement: @ID="metsHdr"
head:
Applies to: pamphlet and bound volume METS documents.

metsHdr must be present, indicating creation and modification dates. At the point of delivery these will be identical, as METS documents are generated as an export at the end of the creation process for a digital object, and are not subject to ongoing revision.

Sub-element agent (role="CREATOR") will indicate the source of the object at BOPCRIS.

dmdSec:
requirement: @ID="dmdSec_Biblio"
head:
Applies to: pamphlet METS documents only.

A dmdSec with attribute ID="dmdSec_Biblio" will be present containing the bibliographic record of the item.

The bibliographic record will be in XML (MIMETYPE="text/xml") and MODS format (MDTYPE="MODS").

The record will be contained within the METS document using mdWrap.

requirement: @ID="dmdSec_page"
head:
Applies to: pamphlet METS documents only.

dmdSecs, one per physical page, will be present, containing descriptive metadata at the page level. attribute ID will be "dmdSec" + "_" + physical page identifier (see requirement ID="file_naming"). Examples: dmdSec_00000001 dmdSec_00000007 dmdSec_00000017

Metadata at this level will be in XML (MIMETYPE="text/xml") and BOPCRIS Descriptors Metadata format (MDTYPE="OTHER")

This metadata will be contained within the METS document using mdWrap.

amdSec:
requirement: @ID="amdSec_Object"
head:
Applies to: pamphlet METS documents only.

An amdSec with attribute ID="amdSec_Object" will be present containing administrative metadata at the item level.

This metadata will include: rightsMD ID="rightsMD_Object" containing rights metadata for the item in XML (MIMETYPE="text/xml") and BOPCRIS Rights Metadata format (MDTYPE="OTHER"). digiProvMD ID="digiprovMD_Object" containing provenance metadata for the digital object in XML (MIMETYPE="text/xml") and BOPCRIS Provenance Metadata format (MDTYPE="OTHER").

These metadata will be contained within the METS document using mdWrap.

requirement: @ID="amdSec_PREMIS-AGENTS"
head:
Applies to: pamphlet METS documents only.

An amdSec with attribute ID="amdSec_PREMIS-AGENTS" will be present. In cases where post-processing actions have been performed on images, such as cropping and/or OCR, this section will contain premis:agent elements indicating the agents that performed such actions.

requirement: @ID="amdSec_file"
head:
Applies to: pamphlet and bound volume METS documents.

amdSecs, one per file, will be present, containing technical information encoded in multiple extension schemata.

The attribute ID will be constructed from "amdSec" + "_" + file ID (see requirement ID="file"). Examples (pamphlet item): amdSec_MASTER_00000001 amdSec_TXT_00000007 amdSec_IDX_00000017 Examples (bound volume): amdSec_MASTER-volume-front-00000001 amdSec_MASTER-volume-back-00000001

amdSecs for every file will contain PREMIS metadata: All files will have a premis:object element. In cases where files are the parent or child of another (due to OCR processing), this will be indicated in the premis:object. The event of derivation will be recorded in a premis:event element, linked to a premis:agent.

The premis:object for each file will indicate the format, and should indicate a registry providing format information when available.

amdSecs for master image files will additionally contain MIX metadata.

fileSec:
requirement: @ID="fileSec"
head:
Applies to: pamphlet and bound volume METS documents.

There will be one file group per file type. fileGrp attribute ID will indicate the relevance. Master Images MASTER Plain text OCR TXT Word co-ordinated XML OCR IDX

Pamphlet items will contain all three file groups, bound volumes will contain only master images.

requirement: @ID="file"
head:
Applies to: pamphlet and bound volume METS documents.

There will be one file element per file, referencing the file location using FLocat . FLocat elements will be LOCTYPE="URL" and use xlink:href to point to files. File locations will be given relative to the path of the METS document.

As the digital object is packaged into a single TAR file for delivery, relative paths allow the TAR package to be unpacked anywhere and, providing the directory structure is maintained on unpacking, all paths should remain accurate.

file ID will be constructed from fileGrp ID + "_" + physical page identifier (see requirement ID="file_naming"). Examples (pamphlet item): MASTER_00000001 TXT_00000007 IDX_00000017 Examples (bound volume): MASTER_volume-front-00000001 MASTER_volume-back-00000001

file GROUPID will be identical to the physical page identifier (see requirement ID="file_naming"). Example (pamphlet item): 00000001 Example (bound volume): volume-front-00000001 volume-back-00000001

file AMDID will link to the amdSec containing the technical metadata for the file, as indicated in requirement ID="amdSec_file".

file CHECKSUM and CHECKSUMTYPE will be present. Checksums will be calculated using MD5.

file MIMETYPE will be present.

structMap:
requirement: @ID="structMaps"
head:
Applies to: pamphlet and bound volume METS documents.

There will be a logical and physical structure map.

In the case of pamphlet items: The logical and physical structure maps will be identical, except the logical structure map will also contain ID linkages to relevant metadata sections.

In the case of bound volumes: The physical structure map will contain only the image files comprising the covers. The logical structure map will also contain pointers to the METS documents of pamphlets contained within that volume.

requirement: @ID="structMap_logical"
head:
Applies to: pamphlet and bound volume METS documents.

In all cases, the following attributes will be present: ID="structMap_logical" TYPE="logical"

In the case of pamphlet items: Top level will contain one div: ID="logical_root" This div will be linked to dmdSec_BIBLIO and amdSec_OBJECT. This div will contain page level divs (see requirement ID="div_page").

In the case of bound volumes: The top level will contain one div: ID="logical_root" This div will contain further divs: TYPE="section" These divs will contain, either: Page level divs (see requirement ID="div_page") Or: mptr elements, pointing to the pamphlets contained within the volume.

requirement: @ID="structMap_physical"
head:
Applies to: pamphlet and bound volume METS documents.

In all cases, the following attributes will be present: ID="structMap_physical" structMap TYPE="physical"

The top level will contain one div. This div will contain page level divs (see requirement ID="div_page").

requirement: @ID="div_page"
head:
Applies to: pamphlet and bound volume METS documents.

Page level divs represent a physical item page.

Page level divs contain a single fptr element for each file that constitutes a representation of that page, in all formats. There will be one fptr present for the file of each type that represents that page. A page level div will contain a mets:fptr for every equivalent file of each type that is a representation of that page.

div TYPE="page"

ORDER and ORDERLABEL will be present, equal to the physical order of the page. In the case of pamphlet METS documents, this will be equal to the sequential page number. In the case of bound volume METS documents, this will be equal to a sequential number beginning at 00000001.

technical_requirements:
content_files:
requirement: @ID="file_naming"

METS documents will be named by the OBJID + ".xml"

There will be one file per file group per physical page.

Files will be referenced from within METS documents using their path relative to the location of the METS document.

Filenames will be 8 characters in length, plus extension.

Files will be sequentially numbered, starting at 00000001.

In the case of pamphlet items: Content files will be arranged by sub-directory according to file group. Examples: ./master/00000001.tif ./txt/00000007.txt ./idx/00000017.idx

In the case of bound volumes: Content files will be arranged by sub-directory according to their relevance to the volume. Examples: ./volume/front/00000001.tif ./volume/back/00000001.tif

The physical page identifier is constructed from components of the relative path and filename (minus extension). Example (pamphlet item): 00000001 00000007 00000017 Examples (bound volume): volume-front-00000001 volume-back-00000001 The physical page identifier will be used in construction of METS element IDs for those elements relating to files or their metadata. See requirement ID="file" for example.

requirement: @ID="master_image_files"

Master image files will be in TIFF 6.0 format.

TIFF files will be compressed using LZW.

requirement: @ID="OCR_files"

OCR output will be present in plain text and word co-ordinated XML format.

Word co-ordinated XML is in IDX format. This is a derivative of Abbyy FineReader SDK XML output, generated by the OCR workflow software: Agora (SRZ Berlin).

IDX XML files contain: A <milestone unit> indicating the dimensions (width and height, in pixels) of the source image. Individual word locations <w> given in pixels relative to the dimensions of the source image. Word locations contain co-ordinates of: left (l), top (t), width (w) and height (h). Individual words are contained within sentences <s>. Sentences are contained within paragraphs <p>. Sentences and paragraphs themselves do not contain co-ordinates.

tool:
agency:
BOPCRIS
note:

The digitization workflow at BOPCRIS is co-ordinated by a relational database. This database is also used by scanner operators to capture metadata for mapping to BOPCRIS Descriptors format. METS documents are generated as the product of a combination of: An export from this database Extraction of technical metadata from digital files (standard file information, and TIFF headers) The bibliographic record for the item.

These tools are currently for internal use only.

  Top of Page Top of Page
 
  The Library of Congress >> Standards
  September 10, 2013

Legal | External Link Disclaimer

Contact Us