ALTO Technical Metadata for Layout and Text Objects

  • About ALTO
  • Community
  • Versions, Namespaces and Schema Locations

    ALTO schemas will be updated by whole numbers upon making changes that break backward compatibility (version 1 to version 2), and decimals for changes that will not (2.0 to 2.1). The namespace itself will also only change on major versions (ns-v2 to ns-v3). The file location for the schemas will follow this pattern: Each major version will have its own subdirectory at www.loc.gov/alto, and the current schema (minor version) will be called alto.xsd in that directory.

    There will also be a copy for direct reference to a particular minor version within that directory, including an exact copy of the current version. For example, the once current version 3.0 was at http://www.loc.gov/alto/v3/alto.xsd and at http://www.loc.gov/alto/v3/alto-3-0.xsd. All previous versions will continue to be available in other subdirectories with their fully versioned names.

    To ensure this does not break current practice, the version http://www.loc.gov/alto/alto.xsd (version 2.1 in August 2014) will continue to exist, but will no longer be updated to reflect the current version.

  • XML Schemas
    • Version 4.4 (Official)
    • Version 4.4 includes the following changes:
      • Add LANG attribute on PageType level to describe the default language used in document
      • Add ROTATION attribute on PageType level to describe the default rotation used in document
      • Add OTHERLANGS attribute on PageType to summarize all the languages present into a particular document
      • Adapt "PointsType" documentation
      • Adapt xLink attribute group documentation on "BlockType"
    • Version 4.3
      • Version 4.3 includes the following changes:
        • Add BASEDIRECTION attribute defining base direction and line orientation to TextLine and BlockType.
        • Add support for explicit reading order definitions with "ReadingOrder" element containing "UnorderedGroup"s, "OrderedGroup"s, and "ElementRef"s
      • Version 4.2
      • Version 4.2 includes the following changes:
        • Change BASELINE to accommodate a list of points in addition to a single point.
        • Make FONTSIZE optional.
        • Add "strikethrough" to list of allowed values for FONTSTYLE.
      • Version 4.1
        • Version 4.1 includes the following changes:
          • Fix for Processing including processingStepType.
          • Add missing PROCESSINGREFS to PageType, PageSpaceType, BlockType, TextLine, StringType for referencing Processing history.
        • Version 4.0
          • Version 4.0 includes the following changes:
            1. Clarification and definition of the licensing to common standard "CC BY-SA 4.0" for this ALTO standard (with agreement of the authors)
            2. Added character based text description with new Glyph element and its subelement Variant (GlyphType, VariantType)
            3. Extended annotation for clarification of the difference of existing element ALTERNATIVE and Glyph/Variant
            4. Introduce generic "Processing", "processingStep" and deprecate "OcrProcessing"
            5. Fix for the element Shape. The Shape element can now only be used once within a PageSpace or a TextLine as it was intended.
            6. Summary of changes
            7. Comments about the schema and its documentation as well as additional use cases for the new schema features are encouraged (GitHub account required).
        • Version 3.1
          • Version 3.1 includes the following changes:
            1. Added support for using different shapes for the elements String, TextLine, all PageSpaceType elements
              and on all BlockType elements.
            2. The description of the attribute ROTATION is changed to the rotation of the contents of a block and not
              the block itself. The attribute is inherited by all sub-elements.
        • Version 3.0
          • Version 3.0 (renamed from 2.2) includes the following changes:
            1. Version added to xsd:schema.
            2. SCHEMAVERSION attribute added to alto element.
            3. documentIdentifier element added to sourceImageInformationType element
            4. documentIdentifierLocation attribute added to documentIdentifier element.
            5. Anonymous types changed to named types (to allow use of xsd:redefine mechanism)
        • Version 2.1
          • Version 2.1 includes the following changes:
            1. Page and BlockType element HEIGHT, WIDTH, HPOS, VPOS attribute types changed to xsd:float from xsd:int.
            2. CircleType HPOS, VPOS and RADIUS attribute type definitions added as xsd:float and made mandatory. Element annotation clarified.
            3. EllipseType HPOS,VPOS,HLENGTH and VLENGTH attribute type definitions added as xsd:float and made mandatory. Element annotation clarified.
            4. MeasurementUnit defined as mandatory and element annotation clarified.
            5. HYP element's CONTENT attribute type definition added as xsd:string.
            6. Tags (LayoutTag/StructureTag/RoleTag/NamedEntityTag/OtherTag) added to allow for tagging content. TAGREFS attribute added to BlockTypes, TextLine and String
            7. CS attribute added to String and Block
            8. LANG attribute added to String, TextLine and TextBlock. Language attribute in TextBlock deprecated.
        • Version 2.0
          • The Library of Congress now serves as the official maintenance agency for ALTO versions 2.0 and higher. The current 2.0 version of the schema entirely implements the previous 1.4 version of the schema with two exceptions: the addition of a loc.gov namespace URI, and an updated URI import reference for the inclusion of XLink functionality.
        • Version 1.4
          • The same schema as v2, prior to conversion to LOC namespace.
        • Version 1.x Previous versions up to 1.4 maintained and hosted by CCS Content Conversion Specialists External URL: https://www.content-conversion.com.
        • Library of Congress XLink Schema
          • The standard XLink Schema used in all applicable Library of Congress XML Schemas. The XLink Schema is imported into the ALTO 2.0 schema by default.
      • Technical Center