Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

TSV, Tab-Separated Values

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name TSV, Tab-Separated Values
Description

A tab-separated values (TSV) file is a text format whose primary function is to store data in a table structure where each record in the table is recorded as one line of the text file. The field’s values in the record are separated by tab characters. Header rows may provide information about the semantics of table columns. TSV files function well as a data exchange format between programs that use structured tables or spreadsheets. These tab-separated value fields may contain a variety of data including text, mathematical, statistical, or scientific data. The TSV file format is widely supported and is very similar to CSV file formats, though data fields stored in CSV files are separated by commas rather than tabular spaces. Both are a type of delimiter-separated value format. For more information on delimiter-separated value formats and the differences between TSV and CSV, see Notes below.

As documented in IANA's description, a TSV file encodes a number of records that may contain multiple fields. Fields that contain tabs are not allowed. The following represents the table structure in plain text.IANA's example of a TSV file is structured as follows:

Name[TAB]Age[TAB]Address

Paul[TAB]23[TAB]1115 W Franklin

Bessy the Cow[TAB]5[TAB]Big Farm Way

Zeke[TAB]45[TAB]W Main St

As mentioned above, field values cannot contain tabs or new line characters so conversion of plain text to TSV requires the following escapes (with parenthetical corresponding ASCII codes):

\n for newline (ascii 0x0a)

\t for tab (ascii 0x09)

\r for carriage return (ASCII 0x0d)

\\ for backslash (ASCII 0x5c)

TSV files can easily be exported into other formats like CSV, XLS, XLSX using common spreadsheet programs.

Production phase May be used at any stage in the lifecycle of a dataset.
Relationship to other formats
    Affinity to CSV_strict, CSV, Comma Separated Values (RFC 4180)

Local use Explanation of format description terms

LC experience or existing holdings As of February 2021, The Library of Congress has about 15,000 files in the TSV format inventoried in its digital storage system. These could be from collections material as well as Library created content.
LC preference The Library of Congress Recommended Formats Statement (RFS) includes TSV as a preferred format for datasets.

Sustainability factors Explanation of format description terms

Disclosure

IANA's registration of the TSV file format MIME type clarifies the specification as; “Doesn’t really need any.” No official specification exists.

    Documentation IANA's registration serves as the de facto specification although there is nothing more formalized.
Adoption

Widely supported and used format for data exchange. The TSV file format is used as an alternative to CSV files since tab stops are unlikely in text as opposed to commas.

fileinfo.com's TSV entry lists numerous software programs across Windows, Mac, and Linux platforms that allows users to open and manipulate TSV files. The Windows software programs include; File Viewer Plus, Microsoft Excel 365, LibreOffice, OpenOffice Calc, Microsoft Notepad, and other generic text editors. Mac software applications to open TSV files include Microsoft Excel 365, LibreOffice, OpenOffice Calc, MacroMates Textmate, as well as generic text editors. The Linux platform applications including LibreOffice, OpenOffice Calc, and text editors allow users to open TSV files.

TSV is a recommended and supported file format in the Edinburgh DataShare and Edinburgh DataVault at the University of Edinburgh.

    Licensing and patents None.
Transparency

A simple text-based format that is both human-readable and easily machine-processable, therefore very transparent.

Self-documentation

Poor. Header rows may provide information to the semantics of columns.

Accessibility Features

Accessibility features for datasets and databases typically involve conformance to W3C's guidelines for page structure, tables and forms. In practical terms, this means pages (if applicable to the dataset) should be well-structured with regions and headings identified and the content is marked up or tagged on a page in a way that uses appropriate and meaningful elements; tables are organized through logical relationship in grids with labeled header cells and data cells that define their relationship; and forms (if applicable to the dataset) validate input provided by the user and provide options to undo changes and confirm data entry and notify users about successful task completion, any errors, and provide instructions to help them correct mistakes. Each of these criteria should be supported by text accessible to a screen reader.

As described in CSV, support for accessibility features is generally poor there is no defined way to defined first row and column contain header cells. Overall, there is limited native support but applications can add options such as described in Make your Excel documents accessible to people with disabilities. Comments welcome.

External dependencies None.
Technical protection considerations None.

Quality and functionality factors Explanation of format description terms

Dataset
Normal functionality A simple format with limited capabilities. The format does not allow tab spaces within fields.
Support for software interfaces (APIs, etc.) The simple nature of the TSV format allows programming for parsing and using the data.
Data documentation (quality, provenance, etc.) No support.
Beyond normal functionality None.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension tsv
TSV (Tab-Separated Values)
Filename extension tab
See https://www.wikidata.org/wiki/Q3513566.
Internet Media Type text/tab-separated-values
See Internet Media type description at IANA.
Other NF00418
See https://www.archives.gov/files/lod/dpframework/id/NF00418.ttl
Pronom PUID x-fmt/13
See http://www.nationalarchives.gov.uk/PRONOM/x-fmt/13.
Wikidata Title ID Q3513566
See https://www.wikidata.org/wiki/Q3513566.

Notes Explanation of format description terms

General

Both TSV and CSV file formats are examples of delimiter-separated-values formats that "store two-dimensional arrays of data by separating the values within each row with specific delimiter characters."

The difference in delimiters for the respective formats (tabs for TSV files and commas for CSV files) affects how the data within each file is read and parsed by other software tools as explained in this github post. CSV files use the escape syntax which allows CSVs to better represent common written text. This poses a problem for some software programs as it may be easy parse the escape syntax incorrectly. Processing TSV files is simpler because of built in Unix tools that can parse TSV data including sort, awk, and diff. These Unix utilities do not process the CSV escape syntax. Additional programming tools such as Python easily parse CSV or TSV files with appropriately installed modules, such as panda. The panda module is able to read CSV or TSV input files and parse the data for desired output. As mentioned in Identification and Description above, TSV files can be exported as additional spreadsheet formats.

History  

Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 05/09/2024