|The Library of Congress >> MINERVA Home|
Notice to Webmasters
If crawling is impacting the performance of your site or you have other concerns, please email immediately:
and copy the Library of Congress project team at firstname.lastname@example.org.
About the Minerva Web Preservation Project
The United States Library of Congress preserves the Nation's cultural artifacts and provides enduring access to them. The Library's traditional functions---acquiring, cataloging, preserving and serving collection materials of historical importance to the Congress and to the American people to foster education and scholarship---extend to digital materials, including Web sites. The Library has selected your site for inclusion in the historic collection of Internet materials. An email notification with further information has been sent separately to a contact at your organization identified by our team. Rather than send to webmaster@ or info@ addresses and risk bounced or filtered messages, we identified contact information for site owners, managers, directors, etc. to ensure successful delivery.
How We Collect Your Web Site
The Library of Congress has contracted with the Internet Archive to collect content from Web sites at regular intervals as specified in the notification sent to your Web site. We have a number of ongoing collections; each notification states the specific project, be it the Election 2004 Web Archive, the War on Iraq Web Archive, or another thematic archive. The Internet Archive uses the Heritrix crawler to collect Web sites on behalf of the Library of Congress. For more information on Heritrix see http://crawler.archive.org/index.html
We bypass Robots.txt (http://www.robotstxt.org/wc/robots.html) in order to get a complete representation in our archive. We crawl to the fullest scope to ensure our archives will represent your site accurately. Objects that we definitely want to include in our archive are any that contribute to the look and feel of your site, such as:
and any other object that contributes to content or the presentation/navigation features of your site. We would prefer to not crawl anything that is not intended for the general public, such as administrative sections. We would be happy to discuss ways in which you can allow desired objects and content to be harvested by the Library of Congress or its agents, or disallow sections that we do not wish to collect.
The Library hopes that you share its vision of preserving Web materials. If you have questions, comments or recommendations concerning the collection of your Web site by the Library of Congress, please e-mail the Library's Minerva Web Preservation Project at email@example.com at your earliest convenience.
For more information about the Library's Web capture projects, please visit www.loc.gov/webcapture/.
| The Library of
Congress >> MINERVA
September 28, 2006