Top of page

JSON/YAML for LoC.gov

Provides structured data about Library of Congress collections.

Introduction

The loc.gov API provides a means to retrieve structured data about Library of Congress collections in popular representations such as JSON and YAML for easy use in software programs and analysis tools.

The API is accessible to the public with no API key or authentication required, however, rate limiting is strongly encouraged. Requests that exceed the rate which loc.gov can successfully accommodate will be blocked to prevent a denial of service.

The current rate limits are:

Newspapers endpoint:
Burst Limit  20 requests per 1 minute, Block for 5 minutes 
Crawl Limit:  20 requests per 10 seconds, Block for 1 hour
Item and resource endpoints:
Burst Limit:  40 requests per 10 seconds, Block for 5 minutes
Crawl Limit  200 requests per 1 minute, Block for 1 hour
Collections, format, and other endpoints:
Burst Limit  20 requests per 10 seconds, Block for 5 minutes
Crawl Limit 80 requests per 1 minute, Block for 1 hour

Accessible information includes:

  • items (books, archived websites, photos, and videos)
  • collections (groups of items selected by librarians according to a theme or other principle)
  • images (thumbnails and higher resolution formats)

The Basics

The loc.gov API is used by adjusting parameters in the query string - the portion at the end of the URL following a ? where each query is a key value pair, known as a query paramter. Multiple parameters in the same query are delimited by an &, e.g., /?query1=foo&query2=bar). To access the API, the parameter fo is used - fo=json to return data in the JSON format, or fo=yaml to return data in the YAML format.

Example: https://www.loc.gov/?fo=json, where https://www.loc.gov/ is the root URL.

Making Requests

Resources throughout loc.gov are accessible at different URL endpoints. A list of the most useful endpoints are noted below:

/search/

Searches everything on the www.loc.gov website. This includes items in the collection, legislation, web pages, blog posts, and press releases. To use this endpoint one must include the q=search term query parameter in the query string.

Example: https://www.loc.gov/search/?q=baseball&fo=json

Some subsites and other resources have not been migrated to the API, so the individual pages for some search results must be accessed using traditional text crawling techniques.

/collections/

Returns a list of all the digital collections at the LOC. In the response, collections are listed under the results key. To simplify things, we can make use of another query paramter at=attribute(s). So putting this together, to fetch a list of all digitized collections, request:

https://www.loc.gov/collections/?fo=json&at=results

/collections/{name of collection}/

Returns the specified digital collection. The name of the collection needs to be in so-called "kebab" case - words separated by hyphens, e.g., abraham-lincoln-papers or baseball-cards. This is known as a URL slug: any part of the URL that comes after the domain, e.g example.com/foo-bar/ where example.com is the domain and foo-bar is the slug.

Example: https://www.loc.gov/collections/civil-war-maps?fo=json

/{format}/

Returns items which have the specified original format. Below is a list of formats along with their URL slugs:

maps:
maps
audio recordings:
audio
photo, print, drawing:
photos
manuscripts/mixed material:
manuscripts
newspapers:
newspapers
film, videos:
film-and-videos
printed music, such as sheet music:
notated-music
archived websites:
websites

Example: https://www.loc.gov/film-and-videos/?q=dog&fo=json

/item/{id}/ and /resource/{id}/

Returns bibliographic information for a single item noted by its identifier. All item records that you find in search results have an id field (the URL for that item’s metadata). Some, but not all, collections have an item_id field, which contains just the unique identifier itself. Since not all collections have item_id, it’s better to rely on the id field and the URL it contains.

Examples:

The structure of the item description returned for one of these requests is described below under Response Objects.

Query Parameters

All endpoints in the loc.gov API provide a number of parameters one may use to alter the response, listed in the following tables:

Control Parameters

Parameters for controlling the response output.


fo

Specify the format of the returned results.
Options: JSON and YAML.

Examples:


fo=json
fo=yaml


at

attributes to return in the results
This is helpful for removing extraneous information from the results, such as more_like_this and related_items. You can specify elements to exclude using at!=.

Examples:

    
    at=item
    at=item,resources,reproductions
    at!=more_like_this
    
    

Other Parameters


q

query parameter
Does a keyword search in the metadata and any available full text including video transcripts

Supported Endpoints

  • /search/
  • /{format}/
  • /collections/{name of collection}/
  • /item/{id}/ and /resource/{id}/

Examples:


q=kittens


fa

filter or facet
takes the format filter-name:value
multiple filters can be used by separating them with a pipe character: |

Available filters/facets include:

  • location
  • subject
  • original-format

Supported Endpoints

  • /search/
  • /{format}/
  • /collections/{name of collection}/

Examples:


fa=location:ohio
fa=location:yellowstone national park
fa=subject:meterology
fa=original-format:periodical|subject:wildlife
fa=partof:performing arts encyclopedia
fa=contributor:lange, dorothea

Many formats are also available as endpoints (e.g. /maps/). Those that are ONLY available using the filters/facets parameter include:


original-format:sound recording
original-format:legislation
original-format:periodical
original-format:personal narrative
original-format:software,e-resource
original-format:3d object
partof

Examples:


fa=partof:performing arts encyclopedia
fa=contributor:lange, dorothea

Collections, divisions, and units in the Library of Congress. Most are also available using the collections endpoint. See Part ofs for a list.


c

results per page. The default is 25

Supported Endpoints

  • /search/
  • /{format}/
  • /collections/{name of collection}/

Examples:


c=50


sp

page in results (results are returned in pages of 25 items unless specified using the c parameter) The first page is sp=1.

Supported Endpoints

  • /search/
  • /{format}/
  • /collections/{name of collection}/

Examples:


sp=2


sb

sort field

Supported Endpoints

  • /search/
  • /{format}/
  • /collections/{name of collection}/

Examples:


sb=2

Available sort options include:


date (earliest to latest)
date_desc (latest to earliest)
title_s (by title)
title_s_desc (reverse by title)
shelf_id (call number/physical location)
shelf_id_desc (reverse by call number/physical location)

Response Objects

A response is the data returned from a request to the loc.gov API in the specified representation (JSON/YAML). Because the API also provides data for the loc.gov website, there is a lot of data (e.g. breadcrumbs, facets) that are specifically for that purpose. This page focuses on the sections of the JSON response that are most useful for working with resources and items e.g.:


/item/{id}/
/resource/{id}/

The JSON response object for a request to an item or resource endpoint responds with a JSON object of the following form:


{
    "views": Object,
    "timestamp": Number,
    "locations": Array,
    "fulltext_service": String,
    "next_issue": String,
    "title_url": String,
    "page": Array,
    "pagination": Object,
    "resource": Object,
    "cite_this": Object,
    "calendar_url": String,
    "previous_issue": String,
    "segments": List,
    "related_items": List,
    "word_coordinates_query": Object,
    "more_like_this": List,
    "articles_and_essays": List,
    "traditional_knowledge_labels": List,
    "item": Object,
    "word_coordinates_pages": Object,
    "type": String,
    "options": Object,
    "resources": List
}

Of these elements the one of interest is item. The item object has the following Structure:


{
    "place_of_publication": String,
    "source_collection": Array,
    "display_offsite": Boolean,
    "contributors": Array,
    "location_county": Array,
    "access_restricted": Boolean,
    "site": Array,
    "original_format": Array,
    "partof_title": Array,
    "date": String,
    "item_type": String,
    "url": String,
    "subject_headings": Array,
    "newspaper_title": Array,
    "created_published": Array,
    "extract_urls": Array,
    "partof_division": Array,
    "contents": Array,
    "subject": Array,
    "index": Number,
    "digital_id": Array,
    "call_number": Array,
    "group": Array,
    "score": Number,
    "location_country": Array,
    "title": String,
    "numeric_shelf_id": Number,
    "description": Array,
    "related_items": Array,
    "id": String,
    "online_format": Array,
    "subjects": Array,
    "location": Array,
    "_version_": Number,
    "mime_type": Array,
    "type": Array,
    "other_formats": Array,
    "library_of_congress_control_number": String,
    "rights_advisory": Array,
    "medium": String,
    "reproduction_number": Array,
    "repository": Array,
    "format": Array,
    "partof": Array,
    "timestamp": String,
    "date_issued": String,
    "reel_numbers": Array,
    "campaigns": Array,
    "raw_lccn": String,
    "number_edition": Array,
    "extract_timestamp": String,
    "genre": Array,
    "number": Array,
    "dates_of_publication": String,
    "partof_collection": Array,
    "other_title": Array,
    "hassegments": Boolean,
    "dates": Array,
    "composite_location": Array,
    "number_lccn": Array,
    "language": Array,
    "rights": Array,
    "locations": Array,
    "notes": Array,
    "shelf_id": String,
    "batch": Array,
    "summary": Array,
    "digitized": Boolean,
    "publication_frequency": Array,
    "resources": Array,
    "aka": Array,
    "contributor_names": Array,
    "image_url": Array,
    "access_advisory": Array
}

Description of item data

This table provides a description of the important fields in an item object:

Field Description Type Example
original_format The kind of object being described (not the digitized version). If the record is for an entire collection, that is included here. array [ "map" ], [ "photo, print, drawing", "collection"]
id HTTP version of the URL for the item, including its identifier. Always appears.
Note: for historical reasons, the ID follows the pattern of an HTTP URL, not an HTTPS URL, event though loc.gov now supports only HTTPS.
string "http://www.loc.gov/item/2017645977/"
partof Collections, divisions, units in the Library of Congress, or any of a number of less formal groupings and subgroupings used for organizing content. array ["prints and photographs division", "lot 10526", "catalog"]
subject

List of subjects. These are separated elements of the Library of Congress Subject Headings. Geography is not shown here, see the location element.

For example, an item with the subject heading "Women -- Afghanistan -- Social conditions" will have [“social conditions", "women's rights"] in the subject element and “afghanistan” in the location element.

For the full subject headings, request the JSON for the /item view

array ["public interest/advocacy", "history", "september 11 terrorist attacks" ]
index The index number of the results among all results. This starts with 1 and continues through all of the results in the whole set (not just this page). integer 1
title Title of the item string "Women Taxpayers; Women Voters"
online_format Format available via the website array ["web page"], ["image","pdf"]
location Place(s) related to the item. These are extracted from subject headings and other metadata, so there may be duplicates. array ["earth (planet)", "planet", "earth"]
mime_type Formats available for download array ["image/gif", "video/mov", "video/mpeg", "application/x-video", "image/jpeg"]
digitized Whether this item has been digitized. boolean true
description Often includes a short, summary description of the original physical item written to accompany the item in a list of search results. ["Correspondence. Typed letter regarding Scandinavian production rights to \"Kiss Me. Kate\" Courtesy of Cole Porter Trust (Copyright Notice)."] ["1 photographic print. | Two Pueblo Indian women posed standing, full-length, New Mexico."]
date Date of item creation. Could be a year or YYYY-MM-DD. Items are sortable by this date. string "2002-08-08" “1910”
dates List of dates related to the item. In ISO 8601 format, UTC. Items are facetable by these dates. array ["2001-01-01T00:00:00Z", "2001-10-30T00:00:00Z", "2001-12-15T00:00:00Z", "2002-01-01T00:00:00Z"]
language Languages associated with the item array ["english", "spanish"]
url URL on the loc.gov website. If the items is something in the library catalog, the URL will start with lccn.loc.gov. string https://www.loc.gov/item/2017711647/
//lccn.loc.gov/08030295
image_url URLs for images in various sizes, if available. If the item is not something that has an image (e.g. it’s a book that’s not digitized or an exhibit), the URL for the image might be for an icon image file. array ["//cdn.loc.gov/service/pnp/bbc/
0000/0000/0004f_150px.jpg",]

Pagination

When accessing the API at the /search/ or /collections/ endpoints it is possible to traverse through the pages of the result programmatically using the pagination section of the response, which has the following structure:


{
    "from": Number,
    "results": String,
    "last": String,
    "total": Number,
    "previous": String,
    "perpage": Number,
    "perpage_options": List,
    "of": Number,
    "next": String,
    "current": Number,
    "to": Number,
    "page_list": List,
    "first": String
}

Description of pagination data

Field Description Example
from Index number of the first result item in this page of results. 26
to Index number of the last result in this page of results. 50
results Index numbers of the result items in this page. “26 - 50”
last URL of the last page of results in the whole set of results pages. "https://www.loc.gov/search/?q=giraffe&sp=5&fo=json”
of Total number of items in the results. 318
previous URL of the preceding page of results. Will be null when this is the first page. "https://www.loc.gov/search/?q=giraffe&sp=1&fo=json",
next URL of the next page of results. Will be null when there are not more pages. "https://www.loc.gov/search/?q=giraffe&sp=3&fo=json"
perpage Number of result items on each page. 25
total Total number of pages available. 5
current Page number you are currently on. 2

Note about deep paging limitations

Due to the technical limitations of search engine technologies, it is not recommended that users page through a large number of result pages. If the number of result pages is excessive, it will be better to use faceting or more specific search terms to reduce the result set. Paging past the 100,000th item in a search result is not supported at this time. In some searches, responses may fail before 100,000 items.

Examples

/search/

Parameter

Description

Output format

q
The search term
 
URL:
        

/collections/{name of collection}/

Parameter

Description

Output format

name of collection
The title of the LOC collection in slug form
 
URL:
        

/{format}/

Parameter

Description

Output format

format
returns items which have a specified original format
 
URL:
        

/item/{id}/

Parameter

Description

Output format

id
the identifier for an item
 
URL:
        

/resource/{id}/

Parameter

Description

Output format

id
the identifier for an item
 
URL: