ALTO Technical Metadata for Optical Character Recognition (OCR)

News Subscribe to ALTO news feed

Schema

Listserv



Documents using the PDF format can be read using free software like Adobe Acrobat Reader. External link: http://www.adobe.com/products/reader/

Get Acrobat Reader

  • About ALTO
  • Community
  • Versions, Namespaces and Schema Locations

    ALTO schemas will be updated by whole numbers upon making changes that break backward compatibility (version 1 to version 2), and decimals for changes that will not (2.0 to 2.1). The namespace itself will also only change on major versions (ns-v2 to ns-v3). The file location for the schemas will follow this pattern: Each major version will have its own subdirectory at www.loc.gov/alto, and the current schema (minor version) will be called alto.xsd in that directory.

    There will also be a copy for direct reference to a particular minor version within that directory, including an exact copy of the current version. For example, the current version 3.0 will be http://www.loc.gov/alto/v3/alto.xsd and at http://www.loc.gov/alto/v3/alto-3-0.xsd. All previous versions will continue to be available in other subdirectories.

    To ensure this does not break current practice, the version http://www.loc.gov/alto/alto.xsd (version 2.1 in August 2014) will continue to exist, but will no longer be updated to reflect the current version.

  • XML Schemas
    • Version 3.0 (Official)
      • Version 3.0 (renamed from 2.2) includes the following changes:
        1. Version added to xsd:schema.
        2. SCHEMAVERSION attribute added to alto element.
        3. documentIdentifier element added to sourceImageInformationType element
        4. documentIdentifierLocation attribute added to documentIdentifier element.
        5. Anonymous types changed to named types (to allow use of xsd:redefine mechanism)
      • Version 2.1 (Official)
        • Version 2.1 includes the following changes:
          1. Page and BlockType element HEIGHT, WIDTH, HPOS, VPOS attribute types changed to xsd:float from xsd:int.
          2. CircleType HPOS, VPOS and RADIUS attribute type definitions added as xsd:float and made mandatory. Element annotation clarified.
          3. EllipseType HPOS,VPOS,HLENGTH and VLENGTH attribute type definitions added as xsd:float and made mandatory. Element annotation clarified.
          4. MeasurementUnit defined as mandatory and element annotation clarified.
          5. HYP element's CONTENT attribute type definition added as xsd:string.
          6. Tags (LayoutTag/StructureTag/RoleTag/NamedEntityTag/OtherTag) added to allow for tagging content. TAGREFS attribute added to BlockTypes, TextLine and String
          7. CS attribute added to String and Block
          8. LANG attribute added to String, TextLine and TextBlock. Language attribute in TextBlock deprecated.
      • Version 2.0
        • The Library of Congress now serves as the official maintenance agency for ALTO versions 2.0 and higher. The current 2.0 version of the schema entirely implements the previous 1.4 version of the schema with two exceptions: the addition of a loc.gov namespace URI, and an updated URI import reference for the inclusion of XLink functionality.
      • Version 1.4
        • The same schema as v2, prior to conversion to LOC namespace.
      • Version 1.x Previous versions up to 1.4 maintained and hosted by CCS Content Conversion Specialists External URL: http://www.content-conversion.com.
      • Library of Congress XLink Schema
        • The standard XLink Schema used in all applicable Library of Congress XML Schemas. The XLink Schema is imported into the ALTO 2.0 schema by default.
    • Technical Center

ALTO (Analyzed Layout and Text Object) is a XML Schema that details technical metadata for describing the layout and content of physical text resources, such as pages of a book or a newspaper. It most commonly serves as an extension schema used within the Metadata Encoding and Transmission Schema (METS) administrative metadata section. However, ALTO instances can also exist as a standalone document used independently of METS.

Aug 25, 2014