Working Within Limits
Working within the resource limits of the JSON/YAML API for loc.gov
Working with Rate Limits
The API is accessible to the public with no API key or authentication required. Rate limiting is strongly encouraged. Requests that exceed the rate which loc.gov can successfully accommodate will be blocked to prevent a denial of service. It is possible that performance will be impacted by rates that do not exceed the rate limit.
Most programming languages have a library available to automate rate limiting, but it can even be accomplished simply by executing a delay before each network call.
The current rate limits are:
Newspapers endpoint: | |
Burst Limit | 20 requests per 1 minute, Block for 5 minutes |
Crawl Limit: | 20 requests per 10 seconds, Block for 1 hour |
Item endpoint: | |
Burst Limit: | 10 requests per 10 seconds, Block for 5 minutes |
Crawl Limit | 200 requests per 1 minute, Block for 1 hour |
Resource endpoint: | |
Burst Limit: | 40 requests per 10 seconds, Block for 5 minutes |
Crawl Limit | 200 requests per 1 minute, Block for 1 hour |
Collections, format, and other endpoints: | |
Burst Limit | 20 requests per 10 seconds, Block for 5 minutes |
Crawl Limit | 80 requests per 1 minute, Block for 1 hour |
Working Within Page Limits
Deep Paging Limitations
Due to the technical limitations of search engine technologies, it is not recommended that users page through a large number of result pages. The deeper you page into a search result, the more memory each page request takes; this process is called "deep paging."
With our current resources, 100,000 is the limit beyond which there will be adverse impacts on overall performance. As a result, paging past the 100,000th item in a search result is not supported at this time, and in some searches, responses may fail before 100,000 items.
One way to limit the size of a result set is by using more specific
search terms - for instance, if you are searching for "Lincoln,"
changing the query parameter from ?q=Lincoln
to
?q=Abraham+Lincoln
. Another way is to use
faceting to ensure that no multi-page result set has more than 100,000
items. This method is also useful to improve response times for result
sets with fewer than 100,000 results.
Deep Paging Limits | |
Maximum items per query (Paging Limits) | 100,000 items per query |
Items Per Page
In addition, too many results per page can impact the performance of queries. The default number of items per page is set for each collection or endpoint to address user experience with the website, rather than performance with the API, so it is expected that users of the API will request a larger number of items per page. That said, performance on individual page requests is also affected by the number of items per page. It is recommended that requests not exceed 1,000 items per page.
Recommended Items Per Page | ||
Maximum items per page (Recommended for most queries) | 1,000 items per page | ?fo=json&at=results&c=1000&sp=1 |
Using Faceting to Limit Result Set Sizes
Each faceted field contains a list of facet filters representing either date ranges (for dates) or the values that occur most often for those fields (often the top ten most common values). Using the URLs for these facet filters will limit the results to only those which have the value (filter.title) selected for by the facet filter. On a results page on the website, these facets appear on the left column of a results page, with each facet filter value linked to the faceted query.
Some faceted fields use a controlled vocabulary (for instance, the Original Format field), while others do not (such as the Subject field). For some items, a faceted field may only have one value (the Online Format field), while other for other facets (such as the Contributors field) it may have one or more values. This second point means that the same item may show up in more than one faceted query for the same search.
Our recommendation for large data results is that rather than paging through one large query, it is more effective to find the facet values for a particular facet for the query and then request and paginate through each facet-filtered query, combining the results locally. In those collections where there are items that appear in more than one faceted search, you can deduplicate result items by the id field.
It may be best to choose for faceting a field that has a limited set of
values and where there is less likely to be overlap. While many items
do have multiple dates associated with them (for instance, serials and
web archives), the dates facet is particularly amenable to this use
because it is a limited range of values that should provide complete
coverage. For any query, you can facet on a date range by just appending
&dates={start_date}/{end_date}
to the query. For most items
in our collections, the dates are stored as a year.
For fields that have a large number of values, you can retrieve the facets
using the /search/index/{field}/
endpoint. For instance,
for the search https://www.loc.gov/search/?fa=partof:american+memory,
you can paginate through the search index query https://www.loc.gov/search/index/location/?fa=partof:american+memory&fo=json&at=facets,pagination.
This will respond with a facets object containing of a list of
filters for the field you're interested in, in order of
decreasing occurrences. Keep in mind that these facets queries are
themselves very resource-intensive, so try not to paginate through
too large a list of values.
Looking at the date ranges on the results page https://www.loc.gov/collections/vietnam-era-pow-mia-database/, you will see these values:
Date Range | Item Count |
---|---|
2020 to 2029 | 322 |
2010 to 2019 | 2,930 |
2000 to 2009 | 8,392 |
1990 to 1999 | 40,149 |
1980 to 1989 | 54,904 |
1970 to 1979 | 60,942 |
1960 to 1969 | 44,682 |
1950 to 1959 | 127 |
1940 to 1949 | 32 |
1930 to 1939 | 5 |
1920 to 1929 | 10 |
1910 to 1919 | 48 |
These facets are not necessarily mutually exclusive; if there are collection items that were created over a range of time, for example, from 2008 to 20021, those items would show up in the 2000/2009, 2010/2019, and 2020/2029 facets. So even with date faceted queries, you may need to deduplicate these results by id.
If you request the following query -
https://www.loc.gov/collections/vietnam-era-pow-mia-database/?fo=json&at=facets
-
you'll get the list of the facet filters. In this case, the date filter is
the fourth filter (0-indexed):
https://www.loc.gov/collections/vietnam-era-pow-mia-database/?fo=json&at=facets.3
"facets.3": {
"filters": [
{
"count": 322,
"not": null,
"off": null,
"on": "https://www.loc.gov/collections/vietnam-era-pow-mia-database/?dates=2020/2029&fo=json",
"term": "2020-01-01T00:00:00Z",
"title": "2020 to 2029"
},
{
"count": 2932,
"not": null,
"off": null,
"on": "https://www.loc.gov/collections/vietnam-era-pow-mia-database/?dates=2010/2019&fo=json",
"term": "2010-01-01T00:00:00Z",
"title": "2010 to 2019"
},
[… deleted several other filters …]
{
"count": 49,
"not": null,
"off": null,
"on": "https://www.loc.gov/collections/vietnam-era-pow-mia-database/?dates=1910/1919&fo=json",
"term": "1910-01-01T00:00:00Z",
"title": "1910 to 1919"
}
],
"title": "Date",
"type": "dates"
}
}
You can take each of those "on" (i.e., "filter is on") urls and append &at=results&c=1000&sp=1
to get the
first 1000 results of that faceted query, and increment the sp
parameter to get 1000 results per page until
you get fewer than 1000 results, which will be the last page of the result set. (Requesting more than
1000 results per page will also impact performance; if you notice any problems, try scaling down to 500, then 100.)
The search results metadata isn't complete; to get the complete metadata for each time, you'll want to pull the
url for each item in the result set to get the complete metadata; for that, you can just append
&fo=json&at=item,resources
to the url value in the item result. Following are clipped outputs of a faceted query for
a search result, then a query for the item record for a single search result.
{
"results": [
[...deleted multiple fields from result set item...]
{
"access_restricted": false,
"aka": [
"https://www.loc.gov/item/powmia/pwmaster_103790/",
"https://www.loc.gov/resource/powmia/pwmaster_103790/"
],
"contributor": [
"commander us military assistance command vietnam"
],
"date": "1967-07-19",
"dates": [
"1967-07-19T00:00:00Z",
"1966-02-01T00:00:00Z"
],
"image_url": [
"https://tile.loc.gov/storage-services/service/frd/pwmia/49/103790.gif"
],
"online_format": [
"pdf"
],
"timestamp": "2022-03-10T23:31:44.885Z",
"title": "CAPTURED US AIRMAN",
"url": "https://www.loc.gov/item/powmia/pwmaster_103790/"
},
https://www.loc.gov/item/powmia/pwmaster_103790/?fo=json&at=item,resources
[...deleted multiple fields from item...]
"item": {
"dates": [
{
"1966 to 1967": "https://www.loc.gov/search/?dates=1966/1967&fo=json"
}
],
"format": [
{
"manuscript/mixed material": "https://www.loc.gov/search/?fa=original_format:manuscript/mixed+material&fo=json"
}
],
"image_url": [
"https://tile.loc.gov/storage-services/service/frd/pwmia/49/103790.gif"
],
"resources": [
{
"files": [
],
"pdf": "https://tile.loc.gov/storage-services/service//frd/pwmia/49/103790.pdf",
"tif": null,
"url": "https://www.loc.gov/item/powmia/pwmaster_103790/"
}
]
}