Top of page

Notice
Monday, February 16, 2026: For the President's Day holiday, The Library will open under normal operating hours.

Text Services

API for accessing full text OCR, word coordinates and context snippets on loc.gov

Introduction

The text-services API provides access to the full text, word coordinates and context snippets for certain image based sets of content on loc.gov. These content sets include but not limited to newspapers and selected digitized books.

Using the API

The text services API is available at:
https://tile.loc.gov/text-services/word-coordinates-service

Following parameters are required at minimum to use this API.

  • segment - Local identifier for this peace of content
  • format - Content format

Rate limits

For rate limits, see Working within Limits

Word Coordinates

Given segment and format parameters only, the API returns the word coordinates for the entire segment
https://tile.loc.gov/text-services/word-coordinates-service?segment=/service/ndnp/khi/batch_khi_bender_ver01/data/sn82016014/00212474198/1919121001/0753.xml&format=alto_xml

Note, the API also supports multiple segments and multiple corresponding formats.

Word coordinate request for multiple segments

Below is the the JSON response. The keys are the the identifiers for each of the segments and the values are the word coordinate data


{
    "/service/ndnp/khi/batch_khi_bender_ver01/data/sn82016014/00212474198/1919121001/0753.xml": {
        "coords": [
            .
            .
            .
            "abilene": [
                {
                    "coordinates":[
                        17908,
                        5652,
                        224,
                        92
                    ],
                    "language": "eng",
                    "position": [17,32]
                    "term": "abilene"
                }
            ],
            .
            .
            .
            "zero": [
                {
                    "coordinates": [
                        17444,
                        5980,
                        252,
                        64
                    ],
                }

        ],
        "height": "23360",
        "width": "21972"
    }

    "/service/ndnp/khi/batch_khi_bender_ver01/data/sn82016014/00212474198/1919121001/0754.xml": {
        "coords": [
        .
        .
        .
        "aares": [{
            "coordinates": [
                13456,
                20892,
                260,
                76
            ],
            "language": "eng",
            "position": [30,971],
            "term": "aares"
        }],
        .
        .
        .
    }

Plain Text OCR

Appending full_text=1 to the request parameters, will return the plain text OCR (full-text) for a given segment.

https://tile.loc.gov/text-services/word-coordinates-service?segment=/service/ndnp/khi/batch_khi_bender_ver01/data/sn82016014/00212474198/1919121001/0753.xml&format=alto_xml&full_text=1

Returns the following JSON which includes, full text as well as the height and the width of the image this OCR represents.


{
    "/service/ndnp/khi/batch_khi_bender_ver01/data/sn82016014/00212474198/1919121001/0753.xml": {
        "full_text": "The Evening Newspaper of Kansas yyEATHER FORECAST for Kansas: Fair and Warmer tonight; Thurs day increasing cloudincs and warmer. HOME EDITION TOPEKA, KANSAS, WEDNESDAY EVENING, DECEMBER 10, 1919-TEN PAGES THREE CENTS After Many Hours of Stormy Sessions at Big Battle Comes To a Satisfactory End IFiP 1 3f 11 JLL&iL Ik Indianapolis State Legislative Measure Would Provide Industrial Court of Three Members Appointed by Governor to Settle Dis putes and Halt Tie-Ups in Production of Essentials. AGITATORS MAY FACE FINES AND TERM IN PRISON Provisions to Affect Employers as Well as Workers...",
        "height": "23360",
        "width": "21972"
    }
}


Byte Range Request

There are times when we only need a portion of the OCR text in a given object. Text Service API supports byte range request with byte_range parameter

Example: https://tile-dev.loc.gov/text-services/word-coordinates-service?segment=/service/gdc/dcmsiabooks/hi/sg/ra/ce/00/no/rr/hisgrace00norr/hisgrace00norr_djvu.xml&format=djvu_xml&byte_range=63701-81694&full_text=1

JSON Response:


{
    "/service/gdc/dcmsiabooks/hi/sg/ra/ce/00/no/rr/hisgrace00norr/hisgrace00norr_djvu.xml": {
        "full_text": "8 HIS GRACE. and it seems almost necessary, for purposes of clear- ness, that I should start by being a little egotistical. I must just mention that my father had been a wealthy merchant who had failed in business late in life and had died shortly afterwards, leaving his widow and his two children with only a few hundreds a year where- with to engage upon the struggle for existence I must add that I, who had been originally intended for the Guards, was compelled by force of circumstances to accept Uncle John’s suggestions with gratitude and I suppose I had better also confess without more ado that I had a certain facility for the composition of poetry. Nobody, I am sure, will be so unkind as to grudge me the privilege of calling my compositions poetry, because nobody who reads these lines is in the very least likely to have ever perused my poems. They have been published but my publisher assures me and I can well believe him that they have at no time had a wide circulation. At that time, however, it did not seem to me an impossible thing that the public might eventually recognize some merit in my attempts at versification and even go so far as to pay me for the same so that, on my way from the City to St. James’s Street I asked myself quite seriously whether it was not my probable mission in life to be a poet in a humble fashion. I have since discovered that my mission in life is essentially prosaic. One makes these discov- eries in the course of a year or two, and they are doubtless salutary, if they are not precisely agreeable. The first thing that I saw, after reaching my club and picking up one of the evening papers, was that the Duke of Hurstbourne was dead. The announce- ment interested me and served to divert my thoughts ",
        "height": "3462",
        "width": "2178"
    }
}

Context Snippets & Search Term Highlighting

When the query parameter is present q= and relevant_snippet parameter is set, text services API returns short text around the given keyword(s). The terms that matched the keyword(s) will be highlighted with special markup tags. The word coordinates for each of the keywords will also be included in the response.

Example:
https://tile-dev.loc.gov/text-services/word-coordinates-service?segment=/service/ndnp/khi/batch_khi_bender_ver01/data/sn82016014/00212474198/1919121001/0753.xml&format=alto_xml&q='abilene'&relevant_snippet=1

JSON Response:


    /service/ndnp/khi/batch_khi_bender_ver01/data/sn82016014/00212474198/1919121001/0753.xml: {
        height: "23360",
        relevant_snippet: "... Joseph, Mo 1.3 Middle Division. [[tag]]Abilene[[/tag]]. d Caldwell 1.0 Kllfmorth 2.(1 Florence. 1.5 Hutchinson; ..t '1.0 Newton 2.0 Pbillipnburg S.O Preston 2.0 Russell 3.0 Western Division. Blakemnn. 4.0 Ieerf!eld 2.0 Garden City 3 0 Gove 3.0 Hill City 3.5 Hugoton 2.0 Leott 5.0 Mlnneola 00 New Ulvsses. T Scott City. 3.0 Trtbnne 0.0 Wakeeney 4.0 THEY LIKE WILSON i Mexicans Pleased To See Presi dent Take Charge of Case. His Policies Will ot Lead to Invasion, Sajs Official. Mexico City, Dec. 9. The state court of Puebla was to take up today request of William O. Jenkins. Ameri can consular a pent, for annulment of his $500 bail and his readmission to the penitentiary there. ...",
        searchTerms: {
            abilene: [
                [
                    16184,
                    11380,
                    428,
                    80
                ]
            ]
        },
            width: "21972"
    }
}

Query Parameters

Parameter Description Example
segment The path to the document(s). Allows for multiple instances of this parameter to be added to the query string to request multiple segments. ?segment=/path/to/segment/
?segment=/path/to/segment_1/&segment=/path/to/segment_2/
format The format of the segment to parse.
Available formats include:
  • plain_text
  • alto_xml
  • djvu_json
  • djvu_xml
?segment=/path/to/segment/&format=alto_xml
full_text Returns the full text of a segment when the value is 1. ?segment=/path/to/segment/&format=alto_xml&full_text=1
relevant_snippet Returns context snippet when the value is 1 ?segment=/path/to/segment/&format=alto_xml&relevant_snippet=1
byte_range Returns a chunk of the segment represented by the byte range. byte_range parameter is in the form of n-m where n and m are integers ?segment=/path/to/segment/&format=djvu_xmll&byte_range=n-m
q The search query. ?segment=/path/to/segment/&format=alto_xml&full_text=1&q=dogs

Dynamic Example

Parameter

Description

segment
The location of the file to convert to full text
format
The format of the file
full_text
Show full text
q
Search and highlight a word
URL: