The Library of Congress >> MINERVA Home
MINERVA: Mapping the Internet Electronic Resources Virtual Archive

Notice to Webmasters


Notice to Webmasters
If you have linked directly to this page your site is being crawled by Internet Archive (www.archive.org) for the Library of Congress. More details on the project can be found below.

If crawling is impacting the performance of your site or you have other concerns, please email immediately:

archive dash crawler dash agent at-symbol lists dot sourceforge dot net

and copy the Library of Congress project team at webcapture@loc.gov.

About the Minerva Web Preservation Project

The United States Library of Congress preserves the Nation's cultural artifacts and provides enduring access to them. The Library's traditional functions---acquiring, cataloging, preserving and serving collection materials of historical importance to the Congress and to the American people to foster education and scholarship---extend to digital materials, including Web sites. The Library has selected your site for inclusion in the historic collection of Internet materials. An email notification with further information has been sent separately to a contact at your organization identified by our team. Rather than send to webmaster@ or info@ addresses and risk bounced or filtered messages, we identified contact information for site owners, managers, directors, etc. to ensure successful delivery.

How We Collect Your Web Site

The Library of Congress has contracted with the Internet Archive to collect content from Web sites at regular intervals as specified in the notification sent to your Web site. We have a number of ongoing collections; each notification states the specific project, be it the Election 2004 Web Archive, the War on Iraq Web Archive, or another thematic archive. The Internet Archive uses the Heritrix crawler to collect Web sites on behalf of the Library of Congress. For more information on Heritrix see http://crawler.archive.org/index.html

We bypass Robots.txt (http://www.robotstxt.org/wc/robots.html) in order to get a complete representation in our archive. We crawl to the fullest scope to ensure our archives will represent your site accurately. Objects that we definitely want to include in our archive are any that contribute to the look and feel of your site, such as:

Images
Javascript
CSS

and any other object that contributes to content or the presentation/navigation features of your site. We would prefer to not crawl anything that is not intended for the general public, such as administrative sections. We would be happy to discuss ways in which you can allow desired objects and content to be harvested by the Library of Congress or its agents, or disallow sections that we do not wish to collect.

The Library hopes that you share its vision of preserving Web materials. If you have questions, comments or recommendations concerning the collection of your Web site by the Library of Congress, please e-mail the Library's Minerva Web Preservation Project at webcapture@loc.gov at your earliest convenience.

For more information about the Library's Web capture projects, please visit www.loc.gov/webcapture/.

 

The Library of Congress >> MINERVA Home
September 28, 2006
Contact Us