The Library of Congress web archives are organized in thematic and event-based collections, and contain websites documenting a variety of U.S. and international organizations representing a broad range of subjects and topic areas. Examples include select U.S. government sites from the Legislative, Judicial, and Executive branch agencies; select foreign government sites; campaign websites and political parties documenting U.S. and select foreign elections; non-profit organizations; journalism and news; creative sites such as those documenting comics, music, authors, and art; legal sites; and international organizations. While most web archives are collected as a part of one or more event or thematic archives, the Library also preserves other sites within its general web archives.
Researchers interested in using the web archives can access collections by visiting Archived websites to search and browse descriptive records. Because of the size and extent of our archives, we have archived content that has not been fully described. To view additional content in the web archives, you can also search by URL. Full text search of the web archives is not currently available.
For more information, see Tips on Searching the Web Archive.
Not all content that the Library has archives for is currently available through the Library’s website. Limitations affecting access to the archived content include:
- Content Embargos: The Library has a one-year embargo period for all content in the archive. Content outside of the embargo period is updated and made available regularly.
- Permissions: Some archived content may not be accessible offsite if the owners have not granted the Library explicit permission to display their archived content offsite. In these cases, the Library may identify a site as part of a collection, but only display a catalog record and a thumbnail image of the site to offsite researchers.
- Processing requirements/workflow: There may be additional captures or websites available through URL search that have not yet been fully processed by Library staff for access.
Technical Limitations of Web Archiving
Web content is archived at particular points in time by archival-quality harvesting software, known as crawlers. The Library intends to reflect as completely as possible how the website looked and behaved at the time it was archived. An attempt is made to gather objects associated with a website including html, images, PDF documents, audio and video files. Web crawlers have technical limitations, and typically are unable to capture streaming media, deep web or database content requiring user input. Interactive components based on programming scripts or content which requires plug-ins for rendering are also difficult to capture with existing web archiving tools.
Embedded content is generally included in the crawls automatically. However, because of our permissions policies we must provide explicit instructions to the crawler regarding content that websites host on third party sites, such as social media accounts. The Library uses "scoping" instructions to direct the crawler to desired content on other domains, as we are able to identify these resources.
Note that "scoped" URLs are given less priority for crawling than seed URLs, so scoped URLs may not be captured as comprehensively as other content in the archive. Scoped URLs appear in the item records found at loc.gov/websites
Because of these technical limitations, not all websites are archived completely and there may be gaps in the archive.
For more about our process, visit About This Program FAQ.