Skip to main content

Program Web Archiving

For Researchers

The Library of Congress web archives are organized in thematic and event-based collections, and contain web sites documenting a variety of U.S. and international organizations representing a broad range of subjects and topic areas. Examples include select U.S. government sites from the Legislative, Judicial, and Executive branch agencies; select foreign government sites; campaign web sites and political parties documenting U.S. and select foreign elections; non-profit organizations; journalism and news; creative sites such as those documenting comics, music, authors, and art; legal sites; and international organizations. While most web archives are collected as a part of one or more event or thematic archives, the Library also preserves other sites within its general web archives.

Researchers interested in using the web archives can access collections by visiting Archived Web Sites to search and browse descriptive records. Because of the size and extent of our archives, we have archived content that has not been fully described. To view additional content in the web archives, you can also search by URL. Full text search of the web archives is not currently available.

For more information, see Tips on Searching the Web Archive.

Access Restrictions

Not all content that the Library has archives for is currently available through the Library’s web site. Limitations affecting access to the archived content include:

  • Content Embargos: The Library has a one-year embargo period for all content in the archive. Content outside of the embargo period is updated and made available regularly.
  • Permissions: Some archived content may not be accessible offsite if the owners have not granted the Library explicit permission to display their archived content offsite. In these cases, the Library may identify a site as part of a collection, but only display a catalog record and a thumbnail image of the site to offsite researchers.
  • Processing requirements/workflow: There may be additional captures or web sites available through URL search that have not yet been fully processed by Library staff for access.

Technical Limitations of Web Archiving

Web content is archived at particular points in time by archival-quality harvesting software, known as crawlers. The Library intends to reflect as completely as possible how the web site looked and behaved at the time it was archived. An attempt is made to gather objects associated with a web site including html, images, PDF documents, audio and video files. Web crawlers have technical limitations, and typically are unable to capture streaming media, deep web or database content requiring user input. Interactive components based on programming scripts or content which requires plug-ins for rendering are also difficult to capture with existing web archiving tools.

Embedded content is generally included in the crawls automatically. However, because of our permissions policies we must provide explicit instructions to the crawler regarding content that web sites host on third party sites, such as social media accounts. The Library uses "scoping" instructions to direct the crawler to desired content on other domains, as we are able to identify these resources.

Note that "scoped" URLs are given less priority for crawling than seed URLs, so scoped URLs may not be captured as comprehensively as other content in the archive. Scoped URLs appear in the item records found at loc.gov/web sites

Because of these technical limitations, not all web sites are archived completely and there may be gaps in the archive.

For more about our process, visit About This Program FAQ.

 Back to top