ALTO Technical Metadata for Optical Character Recognition (OCR)

ALTO Principles

ALTO stores layout information and OCR recognized text of pages of any kind of printed documents like books, journals and newspapers. ALTO is a standardized XML format to store layout and content information. It is designed to be used as an extension schema for use with the Library of Congress' Metadata Encoding and Transmission Schema (METS) XML Schema, where METS provides metadata and structural information while ALTO contains content and physical information.

Each ALTO file contains a style section where different styles (for paragraphs and fonts) are listed. The layout section contains what’s on the page. A page is divided into several regions (Print space, left margin, right margin, top margin and bottom margin). For each region all objects are listed which have been detected inside.

Measurements in ALTO XML files are given in 1/10mm or in 1/1200inch. For presentation purposes one might want to create low resolution images. To use the coordinates within the ALTO file with any resolution they need to be transformed into pixels.

Transforming the inch1200 values to pixel depends on the image resolution. Convert the values into pixel as follows:

  • pixel = value * resolution / 1200

For 1/10mm convert the values into pixel as follows:

  • pixel = value * resolution / 254

ALTO (Analyzed Layout and Text Object) is a XML Schema that details technical metadata for describing the layout and content of physical text resources, such as pages of a book or a newspaper. It most commonly serves as an extension schema used within the Metadata Encoding and Transmission Schema (METS) administrative metadata section. However, ALTO instances can also exist as a standalone document used independently of METS.