ALTO Technical Metadata for Optical Character Recognition (OCR)

      • Version 2.1 includes the following changes:
        1. Page and BlockType element HEIGHT, WIDTH, HPOS, VPOS attribute types changed to xsd:float from xsd:int.
        2. CircleType HPOS, VPOS and RADIUS attribute type definitions added as xsd:float and made mandatory. Element annotation clarified.
        3. EllipseType HPOS,VPOS,HLENGTH and VLENGTH attribute type definitions added as xsd:float and made mandatory. Element annotation clarified.
        4. MeasurementUnit defined as mandatory and element annotation clarified.
        5. HYP element's CONTENT attribute type definition added as xsd:string.
        6. Tags (LayoutTag/StructureTag/RoleTag/NamedEntityTag/OtherTag) added to allow for tagging content. TAGREFS attribute added to BlockTypes, TextLine and String
        7. CS attribute added to String and Block
        8. LANG attribute added to String, TextLine and TextBlock. Language attribute in TextBlock deprecated.
      • The Library of Congress now serves as the official maintenance agency for ALTO versions 2.0 and higher. The current 2.0 version of the schema entirely implements the previous 1.4 version of the schema with two exceptions: the addition of a namespace URI, and an updated URI import reference for the inclusion of XLink functionality.
      • The same schema as v2, prior to conversion to LOC namespace.
      • The standard XLink Schema used in all applicable Library of Congress XML Schemas. The XLink Schema is imported into the ALTO 2.0 schema by default.
ALTO (Analyzed Layout and Text Object) is a XML Schema that details technical metadata for describing the layout and content of physical text resources, such as pages of a book or a newspaper. It most commonly serves as an extension schema used within the Metadata Encoding and Transmission Schema (METS) administrative metadata section. However, ALTO instances can also exist as a standalone document used independently of METS.

February 20, 2014