ALTO Technical Metadata for Optical Character Recognition (OCR)

Using ALTO with METS

ALTO was created for usage with METS. While using METS to wrap ALTO instances is not a requirement, most implementers have chosen to utilize ALTO inside of a METS wrapper. In order to do so, references can be made within the METS <area> element that is used within the METS <structMap>.

Screenshot of a METS area element

The FILEID attribute on <area> refers to the following structure within the METS <fileGrp> element under <fileSec>:

Screenshot of a METS fileGrp element

The BEGIN attribute on <area> then points into the ALTO file itself within one of the children of the METS <amdSec>.

ALTO (Analyzed Layout and Text Object) is a XML Schema that details technical metadata for describing the layout and content of physical text resources, such as pages of a book or a newspaper. It most commonly serves as an extension schema used within the Metadata Encoding and Transmission Schema (METS) administrative metadata section. However, ALTO instances can also exist as a standalone document used independently of METS.