ALTO Technical Metadata for Layout and Text Objects

Structure of ALTO Files

An ALTO file consists of three major sections as children of the root <alto> element:

  • <Description>
  • <Styles>
  • <Layout>

The <Description> section contains metadata about the ALTO file itself and processing information on how the file was created.

The <Styles> section contains the text and paragraph styles with their individual descriptions:

  • <TextStyle> has font descriptions
  • <ParagraphStyle> has paragraph descriptions, e.g. alignment information

The <Layout> section contains the content information. It is subdivided into <Page> elements.

A page consists of margins and printspace, all of those are non-intersection rectangular areas within the page area. Each of these can contain any number of objects like lines, images or textblocks and more. A textblock is divided into textlines and those are divided furthermore in strings and spaces.

The global structure of the ALTO file is as follows:

<alto>
<Description>
<MeasurementUnit/>
<sourceImageInformation/>
<Processing/>
</Description>
<Styles>
<TextStyle/>
<ParagraphStyle/>
</Styles>
<Layout>
<Page>
<TopMargin/>
<LeftMargin/>
<RightMargin/>
<BottomMargin/>
<PrintSpace/>
</Page>
</Layout>
</alto>
↑ Back to top ↑