Top of page

Notice: We are making improvements in the structure of the Web Archives that may result in intermittent unavailability. Read more about these improvements.

Program Web Archiving

Frequently Asked Questions

I was contacted via email by the Library of Congress about archiving of my site. Is this a real request? Is it safe to click on the link in the email?

The Library is notifying site owners through email.

The Library uses a permissions tool that allows easy contact with site owners via email and enables the site owners to respond to permissions requests using a web form. The responses are then recorded in a database.

The email you receive from the Library of Congress contains webcapture AT loc DOT gov in the "from" address and "Inclusion of your Website in the Library of Congress Web Archives" in the subject line.

If you would like to confirm that the Library sent the permission e-mail, please contact us and a member of the Web Archiving Team will assist you.  

I am having difficulty filling out your permissions form.

Please contact us if you have problems with the form or reply to the permission request email, and staff from the Library of Congress Web Archiving Program will assist you.

What does it mean to grant or deny permission to allow the Library to archive?

If you grant the Library permission to archive, it means that the Library of Congress will crawl your website and include archived copies in a larger collection of historically and culturally significant websites that have been designated for preservation. If we do not receive a response to the permission form or you deny permission to archive, the Library is unable to preserve your website and include it in our collections.

What does it mean to grant or deny permission to allow the Library to display off-site?

If you grant the Library permission to display your web archive off-site, it means the Library of Congress will provide public access to the archived copies of your website through loc.gov/. If you deny off-site access, the Library may catalog and identify the site as part of a particular collection on our public website, and provide metadata and a thumbnail image of the web archive, but the web archive of your site will only be available to researchers who visit the Library of Congress buildings in Washington, D.C., and by special arrangement.   

Why was my website selected?

The Library archives websites that are selected by the Library’s subject experts, known as Recommending Officers, based on guidance set forth in subject-focused Collections Policy Statements and the format-focused Supplementary Guidelines for Web Archiving and Social Media. Collecting occurs around subjects, themes, and events identified by Library staff. Recommending Officers select “seed” URLs, which are a starting point for the crawler, and can be a full domain, a subdomain, or simply one page or document – whatever is the desired web content to archive. Depending on the topic of the site, it might have been selected for archiving in multiple thematic or event archives. Our archives cover a wide variety of subjects and topics, with web content published in the United States and internationally.

How often and for how long will you collect my site?

The Library archives sites at various frequencies and for various time periods based on the type of site and the collection it was selected for. Typically the Library crawls websites once a week, once a month, or quarterly, depending on how frequently the content changes, with some sites crawled less frequently. With thousands of seeds being crawled across the frequencies, it is not possible to state an exact time that our (or our agent's) web crawlers will access a given website.

The Library may crawl your site for a specific period of time or on an ongoing basis. This varies depending on the scope of a particular project. Some archiving activities are related to a time-sensitive event, such as before and immediately after a national election. Other collections we are developing may be ongoing with no specified end date, in order to capture changes in websites over a longer period of time. 

What should I do if your crawler causes problems with my site?

The Library or its agent categorically crawls sites politely, in order to minimize server impact. The various speed settings are explained in the Heritrix 3 — Configuring Crawl Jobs documentation. Problems are rare but may occur. Please contact us immediately if you have problems or questions.

My site has a password-protected area that requires a user ID and password. Will this protected content be archived?

The Library does not archive password-protected content, unless by special permission from the site owner.

I have a robots.txt exclusion on my website to block crawlers from certain parts of my site. How does this affect your collecting activity?

The Library attempts to collect as much of the site as possible in order to create an accurate, reproducible copy for future researchers, and because of our permissions policies, we generally bypass robots.txt exclusions. Please contact us immediately if you have questions about this policy.

I use a third-party service to protect my website from unwanted bot traffic. Can my website still be included in the archive?

You may be able to address this by including our web crawlers on an "allow-list." Please contact us for our crawler IP addresses and user agent.

Do we need to contact you if our URL changes?

The Library periodically monitors websites for changes that might affect the crawler, and the crawler provides reports with redirect information, so in general you do not need to notify us of changes. However, we appreciate any updates site owners would like to provide. 

Is there anything I can do to make my website easier for you to archive? 

The Library has published a guide to Creating Preservable websites which offers information for site owners who are interested in this topic.

How do researchers access the web archives?

Researchers access publicly available web archive collections here. Users may browse or search the metadata of available archives and perform a search by URL (but not search the full-text version of the websites). If a site owner has not granted the Library permission to display the content outside library premises, the metadata and a thumbnail will be available off-site, but not the archived content itself (researchers must come onsite to view the web archive). Visit For Researchers for more information about how the web archives are accessed and used. 

What will people see when they access the web archive?

Your archived site will appear much like it was on the day it was archived, and we likely will have multiple captures of the site in our archive, recording changes over time. The Library tries to capture the content as well as the look and feel of the websites in the archive. When viewing the web archive, users will see a banner (see this example) at the top of the page that alerts researchers that they are viewing an archived version. The date that the site was archived also appears in this banner. Researchers will be able to navigate the archived site much like the live web. However, some items do not work in the archive, such as mailto links, forms, fields requiring input (e.g. search boxes), some multimedia, and some social networking sites. 

When will my archived site be available to view?

Web archive collections are made available as permissions, Library policies, and resources permit. The Library will generally wait at least one year from the initial capture of the website before making it available to researchers; sometimes this period is longer due to production and descriptive cataloging work that help to make the web archives available and searchable.

Will there be a link from your archive to my site as it currently exists? 

The Library's web archive will record and display the original URL that we archived, but it will not be hyperlinked to your site nor updated if your site changes, as it is an artifact of the site at the time that we archived it. The public will need to visit your live website in order to retrieve current information. 

What if I do not want the archived version of my website to be available on the Library’s website? How do I opt out?

If you are a copyright owner of or otherwise have exclusive control over materials presently in the archive, you can opt out of online access (outside of Library premises) to the archived version of your website by contacting us. Please provide a link to the URL in our archive when submitting your request, and, if you have an original email the Library sent you to notify you or to seek permission, please provide the tracking information at the bottom of the email to help the Library identify your URL in its collections. Please note that archived content may still be available to scholars on the Library’s premises and by special arrangement. 

What are the copyright implications of the archiving of my site?

The copyright status of your site remains with you. Please see our Rights & Access statement that accompanies each web archive collection. 

Will Library of Congress take over hosting of my site?

No. By archiving your site, the Library of Congress is preserving a reproducible copy of the site as it was at a particular point in time. You are still responsible for hosting and maintaining your live website.

I would like to archive my own website. Can you help me?

The Library of Congress program does not currently assist individuals in the archiving of their personal or organizational websites. However, for site owners interested in archiving content, information is provided by the International Internet Preservation Consortium External and Digital Preservation CoalitionExternal.

Contact Us

Comments, questions, and suggestions related to Web Archiving and this website can be sent to us online.

Location

Web Archiving Program
Library of Congress
101 Independence Avenue, SE
Washington DC 20540-1310