Top of page

Software, E-Resource 3000 .gov tabular dataset Dot gov tabular dataset / Three thousand dot gov tabular dataset

[ dataset ]

About this Item

Title

  • 3000 .gov tabular dataset

Other Title

  • Dot gov tabular dataset
  • Three thousand dot gov tabular dataset

Names

  • Library of Congress Web Archiving Program

Created / Published

  • Washington, D.C. : Library of Congress Web Archiving Program, [2018]

Contents

  • [Comma-separated values (CSV) dataset]. -- [Tab-separated values (TSV) dataset]. -- [Excel (XLS) dataset].

Headings

  • -  Electronic government information--United States
  • -  Electronic spreadsheets--United States
  • -  Web archives--United States

Genre

  • Data sets

Notes

  • -  "Each of these datasets consist of 1,000 files generated from indexes of the Web archives, which were used to derive a random list of 1,000 items identified as CSV, tab-separated (TSV), or Excel (XLS) files and hosted on .gov domains. Each set includes 1,000 unique CSV, TSV, and XLS files and minimal metadata about them, including links to their locations within the Library's web archive."-- Web archive datasets website.
  • -  "Dataset originally created 11/6/2018."--README file
  • -  "This dataset is based on exploratory work begun by the Library of Congress's Web Archiving Team in 2018. The goal of the work is to explore the contents of the Library's web archives through analysis of the indexes containing metadata from the harvested web content, as stored in CDX files. The metadata contained in the indexes was used for initial analysis, rather than the archived content stored in WARC and ARC container files, since W/ARC files present significant challenges due to large size and high processing requirements. The CDX indexes used in this initial analysis were six terabytes (TB) in size, which is a fraction of the web archive content in W/ARC files constituting nearly 1.5 petabytes (PB) at the time of analysis (November 2018)."-- README file
  • -  Title from Web Archive Datasets website, viewed February 16, 2021.

Medium

  • Online resource (datasets)

Call Number/Physical Location

  • JF1525.A8

Repository

  • s-Online Electronic Resource

Digital Id

Library of Congress Control Number

  • 2020445557

Online Format

  • compressed data

Additional Metadata Formats

Rights & Access

The Library of Congress is providing access to The Selected Datasets Collection for educational and research purposes. The Library has obtained permission for the use of many materials in the Collection, and presents additional materials for educational and research purposes in accordance with fair use under United States copyright law. Researchers should watch for modern documents that may be copyrighted (for example, published in the United States more than 95 years ago, or unpublished and the author died less than 70 years ago).

You are responsible for deciding whether your use of the items in this collection is legal. You are also responsible for securing any permissions needed to use the items. You will need written permission from the copyright owners of materials not in the public domain for distribution, reproduction, or other use of protected items beyond that allowed by fair use or other statutory exemptions. Some content may be protected under international law. You may also need permission from holders of other rights, such as publicity and/or privacy rights.

More about Copyright and other Restrictions

Credit Line: Library of Congress, Digital Collections Management and Services Division

Cite This Item

Citations are generated automatically from bibliographic data as a convenience, and may not be complete or accurate.

Chicago citation style:

Library Of Congress Web Archiving Program. .gov Tabular Dataset. [Washington, D.C.: Library of Congress Web Archiving Program, 2019] Software, E-Resource. https://www.loc.gov/item/2020445557/.

APA citation style:

Library Of Congress Web Archiving Program. (2019) .gov Tabular Dataset. [Washington, D.C.: Library of Congress Web Archiving Program] [Software, E-Resource] Retrieved from the Library of Congress, https://www.loc.gov/item/2020445557/.

MLA citation style:

Library Of Congress Web Archiving Program. .gov Tabular Dataset. [Washington, D.C.: Library of Congress Web Archiving Program, 2019] Software, E-Resource. Retrieved from the Library of Congress, <www.loc.gov/item/2020445557/>.