Sustainability of Digital Formats: Planning for Library of Congress Collections
|Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact|
|Full name||TSV, Tab-Separated Values|
A tab-separated values (TSV) file is a text format whose primary function is to store data in a table structure where each record in the table is recorded as one line of the text file. The field’s values in the record are separated by tab characters. Header rows may provide information about the semantics of table columns. TSV files function well as a data exchange format between programs that use structured tables or spreadsheets. These tab-separated value fields may contain a variety of data including text, mathematical, statistical, or scientific data. The TSV file format is widely supported and is very similar to CSV file formats, though data fields stored in CSV files are separated by commas rather than tabular spaces. Both are a type of delimiter-separated value format. For more information on delimiter-separated value formats and the differences between TSV and CSV, see Notes below.
As documented in IANA's description, a TSV file encodes a number of records that may contain multiple fields. Fields that contain tabs are not allowed. The following represents the table structure in plain text.IANA's example of a TSV file is structured as follows:
Paul[TAB]23[TAB]1115 W Franklin
Bessy the Cow[TAB]5[TAB]Big Farm Way
Zeke[TAB]45[TAB]W Main St
As mentioned above, field values cannot contain tabs or new line characters so conversion of plain text to TSV requires the following escapes (with parenthetical corresponding ASCII codes):
\n for newline (ascii 0x0a)
\t for tab (ascii 0x09)
\r for carriage return (ASCII 0x0d)
\\ for backslash (ASCII 0x5c)
TSV files can easily be exported into other formats like CSV, XLS, XLSX using common spreadsheet programs.
|Production phase||May be used at any stage in the lifecycle of a dataset.|
|Relationship to other formats|
|Affinity to||CSV_strict, CSV, Comma Separated Values (RFC 4180)|
|LC experience or existing holdings||As of February 2021, The Library of Congress has about 15,000 files in the TSV format inventoried in its digital storage system. These could be from collections material as well as Library created content.|
|LC preference||The Library of Congress Recommended Formats Statement (RFS) includes TSV as a preferred format for datasets.|
IANA's registration of the TSV file format MIME type clarifies the specification as; “Doesn’t really need any.” No official specification exists.
|Documentation||IANA's registration serves as the de facto specification although there is nothing more formalized.|
Widely supported and used format for data exchange. The TSV file format is used as an alternative to CSV files since tab stops are unlikely in text as opposed to commas.
fileinfo.com's TSV entry lists numerous software programs across Windows, Mac, and Linux platforms that allows users to open and manipulate TSV files. The Windows software programs include; File Viewer Plus, Microsoft Excel 365, LibreOffice, OpenOffice Calc, Microsoft Notepad, and other generic text editors. Mac software applications to open TSV files include Microsoft Excel 365, LibreOffice, OpenOffice Calc, MacroMates Textmate, as well as generic text editors. The Linux platform applications including LibreOffice, OpenOffice Calc, and text editors allow users to open TSV files.
TSV is a recommended and supported file format in MIT’s DSpace implementation as well as the Edinburgh DataShare and Edinburgh DataVault at the University of Edinburgh.
|Licensing and patents||None.|
A simple text-based format that is both human-readable and easily machine-processable, therefore very transparent.
|Self-documentation||Poor. Header rows may provide information to the semantics of columns.|
|Technical protection considerations||None.|
|Normal functionality||A simple format with limited capabilities. The format does not allow tab spaces within fields.|
|Support for software interfaces (APIs, etc.)||The simple nature of the TSV format allows programming for parsing and using the data.|
|Data documentation (quality, provenance, etc.)||No support.|
|Beyond normal functionality||None.|
||TSV (Tab-Separated Values)|
|Internet Media Type||text/tab-separated-values
||See Internet Media type description at IANA.|
|Wikidata Title ID||Q3513566
Both TSV and CSV file formats are examples of delimiter-separated-values formats that "store two-dimensional arrays of data by separating the values within each row with specific delimiter characters."
The difference in delimiters for the respective formats (tabs for TSV files and commas for CSV files) affects how the data within each file is read and parsed by other software tools as explained in this github post. CSV files use the escape syntax which allows CSVs to better represent common written text. This poses a problem for some software programs as it may be easy parse the escape syntax incorrectly. Processing TSV files is simpler because of built in Unix tools that can parse TSV data including sort, awk, and diff. These Unix utilities do not process the CSV escape syntax. Additional programming tools such as Python easily parse CSV or TSV files with appropriately installed modules, such as panda. The panda module is able to read CSV or TSV input files and parse the data for desired output. As mentioned in Identification and Description above, TSV files can be exported as additional spreadsheet formats.