Skip to main content

Program Web Archiving

Frequently Asked Questions

I was contacted via email by the Library of Congress about archiving of my site. Is this a real request? Is it safe to click on the link in the email?

The Library is notifying site owners through email.

The Library uses a permissions tool that allows easy contact with site owners via email and enables the site owners to respond to permissions requests using a web form. The responses are then recorded in a database.

The email you receive from the Library of Congress contains webcapture @ loc.gov in the "from" address, and "Inclusion of your Web Site in the Library of Congress Web Archives" in the subject line.

If you would like to confirm that the Library sent the permission e-mail, please contact us and a member of the Web Archiving Team will assist you.  

What does it mean to grant or deny permission to allow the Library to display offsite?

If you grant the Library permission to display your archived web site offsite, it means the Library of Congress will provide public access to the archived copies of your web site through loc.gov/. If you deny offsite access, the Library may catalog and identify the site as part of a particular collection on our public web site, and provide metadata and a thumbnail image of the web archive, but the web archive of your site will only be available to researchers who visit the Library of Congress buildings in Washington, D.C., and by special arrangement.   

I am having difficulty filling out your permissions form.

Please contact us if you have problems with the form, or reply to the permission request email and someone from the Library’s project team will assist you.

Why have I received multiple permission requests from the Library of Congress?

In previous years, the Library was required to send permission notices to all selected web sites in every collection it initiated, even if the site had previously granted or denied permission. Policies changed in 2006 and the Library can now request and apply blanket permission. This means that if a site owner granted permission after 2006, the Library can use that permission for future collections. This has minimized duplication in permission requests, however the Web Archiving Team occasionally contacts site owners for additional permissions if required.

Why was my web site selected?

The Library maintains a collections policy statement and other internal documents to guide the selection of electronic resources, including web sites. Web sites are selected for archiving by Library Recommending Officers. Sites in the web archive are generally representative samples of web content that document an event or cover a particular theme or subject area for our thematic and event collections. 

How often and for how long will you collect my site?

The Library archives sites at various frequencies and for various time periods based on the type of site and the collection it was selected for. Typically the Library crawls web sites once a week, once a month, or quarterly, depending on how frequently the content changes. Some sites are crawled less frequently—just once or twice a year. In some instances, the Library uses RSS feeds to identify rapidly changing content and to crawl multiple times per day. 

The Library may crawl your site for a specific period of time or on an ongoing basis. This varies depending on the scope of a particular project. Some archiving activities are related to a time-sensitive event, such as before and immediately after a national election. Other collections we are developing may be ongoing with no specified end date, in order to capture changes in web sites over a longer period of time. 

What should I do if your crawler causes problems with my site?

The Library or its agent always tries to politely crawl sites in order to minimize server impact. Occasionally there may be problems. Please contact us immediately if you have problems or questions.

My site has a password-protected area that requires a user ID and password. Will this protected content be archived?

The Library does not archive password-protected content, unless by special permission from the site owner.

I have a robots.txt exclusion on my web site to block crawlers from certain parts of my site. How does this affect your collecting activity?

The Library attempts to collect as much of the site as possible in order to create an accurate snapshot for future researchers, and because of our permissions policies, we generally bypass robots.txt exclusions. Please contact us immediately if you have questions about this policy.

Do we need to contact you if our URL changes?

The Library periodically monitors web sites for changes that might affect the crawler, and the crawler provides reports with redirect information, so in general you do not need to notify us of changes. However, we appreciate any updates site owners would like to provide. 

Is there anything I can do to make my web site easier for you to archive? 

The Library has published a guide to Creating Preservable Web Sites which offers information for site owners who are interested in this topic.

How do researchers access the archived web sites?

Researchers access publicly available web archive collections here. Users may browse or search the metadata of available archives and perform a search by URL (but not search the full-text version of the web sites). If a site owner has not granted the Library permission to display the content outside library premises, the metadata and a thumbnail will be available offsite, but not the archived content itself (researchers must come onsite to view the web archive). Visit For Researchers for more information about how the web archives are accessed and used. 

What will people see when they access the archived site?

Your archived site will appear much like it was on the day it was archived, and we likely will have multiple captures of the site in our archive, recording changes over time. The Library tries to capture the content as well as the look and feel of the web sites in the archive. When viewing the web archive, users will see a banner (see this example) at the top of the page that alerts researchers that they are viewing an archived version. The date that the site was archived also appears in this banner. Researchers will be able to navigate the archived site much like the live web. However, some items do not work in the archive, such as mailto links, forms, fields requiring input (e.g. search boxes), some multimedia, and some social networking sites. 

When will my archived site be available to researchers?

Web archive collections are made available as permissions, Library policies, and resources permit. The Library will generally wait at least one year from the initial capture of the web site before making it available to researchers; sometimes this period is longer due to production and descriptive cataloging work that help make the web archives available and searchable. If you have concerns about public access to the archived version of your web site, or if you would like additional information, please contact us directly.

Will there be a link from your archive to my site as it currently exists? 

The Library's web archive will record and display the original URL that we archived, but it will not be hyperlinked to your site nor updated if your site changes, as it is an artifact of the site at the time that we archived it. The public will need to visit your live web site in order to retrieve current information. 

What if I do not want the archived version of my web site to be available on the Library’s web site? How do I opt out?

If you are a copyright owner of or otherwise have exclusive control over materials presently in the archive, you can opt out of online access (outside of Library premises) to the archived version of your web site by contacting us. Please provide a link to the URL in our archive when submitting your request, and, if you have an original email the Library sent you to notify you or to seek permission, please provide the tracking information at the bottom of the email to help the Library identify your URL in its collections. Please note that archived content may still be available to scholars on the Library’s premises and by special arrangement. 

What are the copyright implications of the archiving of our site?

The copyright status of your site remains with you. Please see our Rights & Access statement that accompany each web archive collection. 

Will Library of Congress take over hosting of my site?

No. By archiving your site, the Library of Congress is preserving a snapshot of your site at a particular time. You are still responsible for hosting and maintaining your live web site.

I would like to archive my own web site. Can you help me?

The Library of Congress program does not currently assist individuals in the archiving of their personal or organizational web sites. However, information for site owners interested in archiving content is provided by the International Internet Preservation Consortium External.

 Back to top