The Library of Congress, the California Digital Library, the University of North Texas Libraries, the Internet Archive and the U.S. Government Printing Office today announced a collaborative project to preserve public United States Government web sites at the end of the current presidential administration ending January 19, 2009. This harvest is intended to document federal agencies' online archive during the transition of government and to enhance the existing collections of the five partner institutions.
As part of this collaboration, the Internet Archive will undertake a comprehensive crawl of the .gov domain. The Library of Congress has been preserving congressional Web sites on a monthly basis since December 2003 and will focus on development of this archive for the project. The University of North Texas and the California Digital Library will focus on in-depth crawls of specific government agencies. The project will also call upon government information specialists -- including librarians, political and social science researchers, and academics -- to assist in the selection and prioritization of web sites to be included in the collection, as well as identifying the frequency and depth of the act of collecting. The Government Printing Office will lend expertise to the curation process along with libraries in its Federal Depository Library Program. A tool has been designed by the project team and developed by the University of North Texas to facilitate the collaborative work of these specialists, and will be made available to participants in August 2008.
"Digital government information is considered at-risk, with an estimated life span of 44 days for a web site. This collection will provide an historical record of value to the American people," said Director of Program Management Martha Anderson of the Library of Congress’ National Digital Information Infrastructure and Preservation Program (NDIIPP).
The Library of Congress, the world’s preeminent reservoir of knowledge, is leading a nationwide program to collect and preserve at-risk digital content of cultural and historical importance. The program, formally called the National Digital Information Infrastructure and Preservation Program (www.digitalpreservation.gov
), is building a digital preservation network of partners. More information about the Library’s Web Capture program is available at www.loc.gov/webcapture
; more information about the Library’s other resources can be found at www.loc.gov
The California Digital Library (www.cdlib.org
) leads the NDIIPP funded Web-at-Risk project, which is developing tools that enable librarians and archivists to capture, curate, preserve, and provide access to web-based government and political information. In partnership with the University of California libraries, the California Digital Library established the digital preservation program to ensure long-term access to the digital information that supports and results from research, teaching and learning at the university.
The University of North Texas Libraries, as part of the Federal Depository Library Program, created the CyberCemetery (http://govinfo.library.unt.edu
) in 1997, to capture and provide permanent public access to the web sites and publications of defunct U.S. government agencies and commissions. The University of North Texas participates in the NDIIPP program as a partner with the California Digital Library in the Web-at-Risk project focusing on the selection of materials for capture and preservation.
The U.S. Government Printing Office manages the Federal Depository Library Program and is charged with providing permanent public access to government publications.
The Internet Archive is a high-tech nonprofit, founded in 1996 by Brewster Kahle as an “Internet library” to provide universal and permanent access to digital information for educators, researchers, historians, and the general public. The Internet Archive captures, stores and provides access to born-digital and digitized content, and leads the development of Heritrix, the open-source archival web crawler, used to facilitate the collection of web data for this project.