|
Structure of ALTO Files
An ALTO file consists of three major sections as children of the root <alto> element:
<Description>
<Styles>
<Layout>
The <Description> section contains metadata about the ALTO file itself and processing information on how the file was created.
The <Styles> section contains the text and paragraph styles with their individual descriptions:
<TextStyle> has font descriptions
<ParagraphStyle> has paragraph descriptions, e.g. alignment information
The <Layout> section contains the content information. It is subdivided into <Page> elements.
A page consists of margins and printspace, all of those are non-intersection rectangular areas within the page area. Each of these can contain any number of objects like lines, images or textblocks and more. A textblock is divided into textlines and those are divided furthermore in strings and spaces.
The global structure of the ALTO file is as follows:
<alto>
<Description>
<MeasurementUnit/>
<sourceImageInformation/>
<Processing/>
</Description>
<Styles>
<TextStyle/>
<ParagraphStyle/>
</Styles>
<Layout>
<Page>
<TopMargin/>
<LeftMargin/>
<RightMargin/>
<BottomMargin/>
<PrintSpace/>
</Page>
</Layout>
</alto>
↑ Back to top ↑
|