ALTO Technical Metadata for Layout and Text Objects

Structure of ALTO Files

An ALTO file consists of three major sections as children of the root <alto> element:

  • <Description>
  • <Styles>
  • <Layout>

The <Description> section contains metadata about the ALTO file itself and processing information on how the file was created.

The <Styles> section contains the text and paragraph styles with their individual descriptions:

  • <TextStyle> has font descriptions
  • <ParagraphStyle> has paragraph descriptions, e.g. alignment information

The <Layout> section contains the content information. It is subdivided into <Page> elements.

A page consists of margins and printspace, all of those are non-intersection rectangular areas within the page area. Each of these can contain any number of objects like lines, images or textblocks and more. A textblock is divided into textlines and those are divided furthermore in strings and spaces.

The global structure of the ALTO file is as follows:

↑ Back to top ↑