ALTO Technical Metadata for Optical Character Recognition (OCR)

Structure of ALTO Files

An ALTO file consists of three major sections as children of the root <alto> element:

  • <Description>
  • <Styles>
  • <Layout>

The <Description> section contains metadata about the ALTO file itself and processing information on how the file was created.

The <Styles> section contains the text and paragraph styles with their individual descriptions:

  • <TextStyle> has font descriptions
  • <ParagraphStyle> has paragraph descriptions, e.g. alignment information

The <Layout> section contains the content information. It is subdivided into <Page> elements.

A page consists of margins and printspace, all of those are non-intersection rectangular areas within the page area. Each of these can contain any number of objects like lines, images or textblocks and more. A textblock is divided into textlines and those are divided furthermore in strings and spaces.

The global structure of the ALTO file is as follows:

<alto>
<Description>
<MeasurementUnit/>
<sourceImageInformation/>
<Processing/>
</Description>
<Styles>
<TextStyle/>
<ParagraphStyle/>
</Styles>
<Layout>
<Page>
<TopMargin/>
<LeftMargin/>
<RightMargin/>
<BottomMargin/>
<PrintSpace/>
</Page>
</Layout>
</alto>
↑ Back to top ↑

ALTO (Analyzed Layout and Text Object) is a XML Schema that details technical metadata for describing the layout and content of physical text resources, such as pages of a book or a newspaper. It most commonly serves as an extension schema used within the Metadata Encoding and Transmission Schema (METS) administrative metadata section. However, ALTO instances can also exist as a standalone document used independently of METS.