Sustainability of Digital Formats: Planning for Library of Congress Collections |
|
Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact |
Full name | TSV, Tab-Separated Values |
---|---|
Description |
A tab-separated values (TSV) file is a text format whose primary function is to store data in a table structure where each record in the table is recorded as one line of the text file. The field’s values in the record are separated by tab characters. Header rows may provide information about the semantics of table columns. TSV files function well as a data exchange format between programs that use structured tables or spreadsheets. These tab-separated value fields may contain a variety of data including text, mathematical, statistical, or scientific data. The TSV file format is widely supported and is very similar to CSV file formats, though data fields stored in CSV files are separated by commas rather than tabular spaces. Both are a type of delimiter-separated value format. For more information on delimiter-separated value formats and the differences between TSV and CSV, see Notes below. As documented in IANA's description, a TSV file encodes a number of records that may contain multiple fields. Fields that contain tabs are not allowed. The following represents the table structure in plain text.IANA's example of a TSV file is structured as follows: Name[TAB]Age[TAB]Address Paul[TAB]23[TAB]1115 W Franklin Bessy the Cow[TAB]5[TAB]Big Farm Way Zeke[TAB]45[TAB]W Main St As mentioned above, field values cannot contain tabs or new line characters so conversion of plain text to TSV requires the following escapes (with parenthetical corresponding ASCII codes): \n for newline (ascii 0x0a) \t for tab (ascii 0x09) \r for carriage return (ASCII 0x0d) \\ for backslash (ASCII 0x5c) TSV files can easily be exported into other formats like CSV, XLS, XLSX using common spreadsheet programs. |
Production phase | May be used at any stage in the lifecycle of a dataset. |
Relationship to other formats | |
Affinity to | CSV_strict, CSV, Comma Separated Values (RFC 4180) |
LC experience or existing holdings | As of February 2021, The Library of Congress has about 15,000 files in the TSV format inventoried in its digital storage system. These could be from collections material as well as Library created content. |
---|---|
LC preference | The Library of Congress Recommended Formats Statement (RFS) includes TSV as a preferred format for datasets. |
Disclosure |
IANA's registration of the TSV file format MIME type clarifies the specification as; “Doesn’t really need any.” No official specification exists. |
---|---|
Documentation | IANA's registration serves as the de facto specification although there is nothing more formalized. |
Adoption |
Widely supported and used format for data exchange. The TSV file format is used as an alternative to CSV files since tab stops are unlikely in text as opposed to commas. fileinfo.com's TSV entry lists numerous software programs across Windows, Mac, and Linux platforms that allows users to open and manipulate TSV files. The Windows software programs include; File Viewer Plus, Microsoft Excel 365, LibreOffice, OpenOffice Calc, Microsoft Notepad, and other generic text editors. Mac software applications to open TSV files include Microsoft Excel 365, LibreOffice, OpenOffice Calc, MacroMates Textmate, as well as generic text editors. The Linux platform applications including LibreOffice, OpenOffice Calc, and text editors allow users to open TSV files. TSV is a recommended and supported file format in the Edinburgh DataShare and Edinburgh DataVault at the University of Edinburgh. |
Licensing and patents | None. |
Transparency |
A simple text-based format that is both human-readable and easily machine-processable, therefore very transparent. |
Self-documentation |
Poor. Header rows may provide information to the semantics of columns. Accessibility Features Accessibility features for datasets and databases typically involve conformance to W3C's guidelines for page structure, tables and forms. In practical terms, this means pages (if applicable to the dataset) should be well-structured with regions and headings identified and the content is marked up or tagged on a page in a way that uses appropriate and meaningful elements; tables are organized through logical relationship in grids with labeled header cells and data cells that define their relationship; and forms (if applicable to the dataset) validate input provided by the user and provide options to undo changes and confirm data entry and notify users about successful task completion, any errors, and provide instructions to help them correct mistakes. Each of these criteria should be supported by text accessible to a screen reader. As described in CSV, support for accessibility features is generally poor there is no defined way to defined first row and column contain header cells. Overall, there is limited native support but applications can add options such as described in Make your Excel documents accessible to people with disabilities. Comments welcome. |
External dependencies | None. |
Technical protection considerations | None. |
Dataset | |
---|---|
Normal functionality | A simple format with limited capabilities. The format does not allow tab spaces within fields. |
Support for software interfaces (APIs, etc.) | The simple nature of the TSV format allows programming for parsing and using the data. |
Data documentation (quality, provenance, etc.) | No support. |
Beyond normal functionality | None. |
Tag | Value | Note |
---|---|---|
Filename extension | tsv |
TSV (Tab-Separated Values) |
Filename extension | tab |
See https://www.wikidata.org/wiki/Q3513566. |
Internet Media Type | text/tab-separated-values |
See Internet Media type description at IANA. |
Other | NF00418 |
See https://www.archives.gov/files/lod/dpframework/id/NF00418.ttl |
Pronom PUID | x-fmt/13 |
See http://www.nationalarchives.gov.uk/PRONOM/x-fmt/13. |
Wikidata Title ID | Q3513566 |
See https://www.wikidata.org/wiki/Q3513566. |
General |
Both TSV and CSV file formats are examples of delimiter-separated-values formats that "store two-dimensional arrays of data by separating the values within each row with specific delimiter characters." The difference in delimiters for the respective formats (tabs for TSV files and commas for CSV files) affects how the data within each file is read and parsed by other software tools as explained in this github post. CSV files use the escape syntax which allows CSVs to better represent common written text. This poses a problem for some software programs as it may be easy parse the escape syntax incorrectly. Processing TSV files is simpler because of built in Unix tools that can parse TSV data including sort, awk, and diff. These Unix utilities do not process the CSV escape syntax. Additional programming tools such as Python easily parse CSV or TSV files with appropriately installed modules, such as panda. The panda module is able to read CSV or TSV input files and parse the data for desired output. As mentioned in Identification and Description above, TSV files can be exported as additional spreadsheet formats. |
---|---|
History |
|