Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

Microsoft Office Excel 97-2003 Binary File Format (.xls, BIFF8)

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Microsoft Office Excel 97-2003 Binary File Format (.xls). Also known as BIFF8.
Description

The Microsoft Excel Binary File format, with the .xls extension and referred to as XLS or MS-XLS, was the default format used for spreadsheets in Excel through Microsoft Office 2003. The format is also referred to as Binary Interchange File Format (BIFF) in Microsoft's technical documentation. This format description is primarily for version 8 of BIFF (BIFF8), introduced with Excel 97 in 1997. Although it cannot support the latest functionality of the Excel application, BIFF8 has continued to be available as an alternative to the XLSX/OOXML format, standardized as ISO/IEC 29500, for saving spreadsheet files in Excel. As of late 2019, the documentation for File formats that are supported in Excel, from Microsoft, lists two variants of XLS format, distinguishing between "Excel 5.0/95 Binary file format" and "Excel 97-Excel 2003 Binary file format." These correspond to BIFF5 and BIFF8, respectively. See Notes below for more detail on versions of BIFF.

Although the XLS format is proprietary, since 2007 it has been covered by Microsoft's Open Specification Promise. The specification released in 2007 is available as Microsoft Office Excel 97-2007 Binary File Format Specification [*.xls]. Since 2008, the structure for the XLS format used since Excel 97 has been kept up-to-date at [MS-XLS].

The structure of an XLS file since 1993 (BIFF5/Excel 5.0) is an OLE (object linking and embedding) compound file as specified in [MS-CFB]. A CFB provides a file-system-like structure within a file for the storage of arbitrary, application-specific streams of data. It consists of storages, streams, and substreams. Each binary stream or substream is written as a series of binary records. An XLS file must contain a single Workbook stream which has a single Globals Substream with at least one sheet substream, which could be a Worksheet Substream, Chart Sheet Substream, Macro Sheet Substream, or Dialog Sheet Substream. These streams and substreams employ BIFF8 encoding for component binary records. A substream has a BoF (beginning of file) record that includes an indicator for the BIFF version. Hence the mandatory Workbook Globals Substream can be used to recognize the BIFF version. An XLS file in BIFF8 (or BIFF5) encoding begins as follows, with all values given as they occur in the physical file, for example when viewed using a Hex dump utility:

  • CFB header:
    • Header Signature for the CFB format with 8-byte Hex value D0CF11E0A1B11AE1. Gary Kessler notes that the beginning of this string looks like "DOCFILE"
    • 16 bytes of zeroes
    • 2-byte Hex value 3E00 indicating CFB minor version 3E
    • 2-byte Hex value 0300 indicating CFB major version 3 or value 0400 indicating CFB major version 4. [Note: All XLS files created recently by compilers of this resource (in versions of Excel for MacOS and Windows) and examined with a Hex dump utility have been based on CFB major version 3. Comments welcome.]
    • 2-byte Hex value FEFF indicating little-endian byte order for all integer values. This byte order applies to all CFB files.
    • 2-byte Hex value 0900 (indicating the sector size of 512 bytes used for major version 3) or 0C00 (indicating the sector size of 4096 bytes used for major version 4)
    • 480 bytes for remainder of the 512-byte header, which fills the first sector for a CFB of major version 3
    • For a CFB of major version 4, the rest of the first sector, 3,584 bytes of zeroes
  • BoF (beginning of file) record for the mandatory Workbook Globals Substream, which must be the first substream in a BIFF8 XLS file:
    • 2-byte BoF record number field. Hex value 0908. 09 indicates a BoF record. 08 indicates the BIFF version.
    • 2 bytes unspecified
    • BoF record data, starting with 2-byte Hex value 0006, indicating BIFF8
    • 2-byte Hex value 0500, indicating that the substream stream for which this is the record data is the mandatory Workbook Globals Substream

Following the Workbook Globals Substream, the actual data, stored in substreams for Sheets, Charts, Macros, etc., will be stored, also in BIFF8 format.

The XLS format was superseded as the default format for Microsoft Excel starting with Excel 2007 by XLSX/OOXML, the primary XML-based spreadsheet format of the Office Open XML (OOXML) family.

Production phase Can be used in any production phase: for creating documents (initial state): for editing and review (middle-state); and for final distribution.
Relationship to other formats
    Subtype of CFB_3, Microsoft Compound File Binary File Format, Version 3. The compilers of this resource have experimented with saving spreadsheets as XLS files in several recent versions of Excel. In all cases, the resulting file was in version 3 of CFB. Comments welcome.
    Has later version XLSX/OOXML_2012, XLSX Transitional (Office Open XML), ISO 29500:2008-2016, ECMA-376, Editions 1-5

Local use Explanation of format description terms

LC experience or existing holdings Library staff creating spreadsheets as part of their duties typically use the XLSX format. As of late 2020, the Library of Congress has over 389,000 files with the .xls extension in its digital collections, with a total size of over 226 gigabytes. The corresponding figures for the .xlsx extension are over 685,000 files with a total size of over 722 gigabytes. These files come from several different sources. Some may be datasets acquired individually or as supplements to published articles. Other sources are archived websites and files acquired by the Manuscript Division in collections of "papers" from individuals or organizations.
LC preference For works acquired for its collections, the list of Library of Congress Recommended Formats Statement for Datasets includes XLS (.xls) as a preferred format for datasets.

Sustainability factors Explanation of format description terms

Disclosure The Microsoft Excel Binary XLS file format is proprietary but openly documented and covered by Microsoft's Open Specification Promise.
    Documentation The specification is available at [MS-XLS]: Excel (.xls) Binary File Format. This document is updated quite frequently; changes are documented.
Adoption

Very widely used. The Market for Spreadsheets, an extract from Chapter 8 of Winners, Losers, and Microsoft: Competition and Antitrust in High Technology (2001) by Stan J. Liebowitz found that the market share of Excel in sales of spreadsheet software grew steadily in the decade between 1988 and 1997. Excel's dollar share overtook that of Lotus 1-2-3 in 1993, and by 1997, was 90%. Thus, when the BIFF8 version of the XLS format was introduced, Excel dominated the spreadsheet software market. Excel continues to be the market leader for professional spreadsheet use. See, for example, Is Excel the best spreadsheet software available?, a 2018 question with answers from Quora. Most new spreadsheets created using Excel will be in the default XLSX/OOXML format, but a number of heavy spreadsheet users choose to use the XLS or XLSB formats for faster loading and saving.

The corpus of existing spreadsheet documents on the open web has roughly equal numbers of the binary XLS format and the XML-based XLSX format. A Google search in December 2019 of the U.S. web by filetype yielded: .xls, 8,450,000; .xlsx, 7,630,000; .ods, 154,000. The compilers of this resource acknowledge that searches of the web are not a reliable measure of adoption for spreadsheet file formats at the initial (creation) phase of a content lifecycle. Most spreadsheets are private and those that are made available on the web are likely to be converted to the format considered most likely to be usable by the intended audience.

All mainstream spreadsheet applications can import files in the BIFF8 version of the XLS format. This includes: LibreOffice Calc, Apache OpenOffice Calc, Quattro Pro (now part of Corel WordPerfect Office), Google Sheets, and Apple's Numbers. See also table of supported formats in spreadsheet software from Wikipedia.

The binary XLS format, particularly BIFF8, appears relatively frequently on lists of acceptable formats for archiving of data. For example, see recommendations from the Library of Congress, the UK Data Service, the National Archives of Australia, and the U.S. National Archives (NARA). In this context, the assumption is usually that the data per se is stored in a worksheet as a rectangular grid with columns representing variables/measurements and rows representing columns. Note that recommended practice for archiving datasets always calls for a "codebook" or other documentation that explains both the scope and context for the data's collection and descriptions for each variable, but does not expect such metadata to be in the same file as the data. For example, see guidance from the UK Data Archive, DMPtool from the University of California Curation Center, and the Dryad Digital Repository.

The XLS page at fileformats.archiveteam.org lists some open-source software libraries available for manipulating files in the binary file format used as the native format by Microsoft Excel 97, 2000, 2002, and Office Excel 2003. Weaponized MS Office 97-2003 legacy/binary formats (doc, xls, ppt, ...) also lists software libraries. Python "packages" for working with .xls files are listed at Working with Excel Files in Python. The compilers of this resource have not determined the extent to which any of the software listed in these resources is actively maintained. Comments welcome.

FreeXL is an open source library to extract valid data from within an Excel (.xls) spreadsheet; this software completely ignores user-interface details. See FreeXL: Other tools and libraries for a list of software supporting the XLS format that was compiled by the author of FreeXLS.

Widely used commercial data analysis or file conversion packages that can read and write XLS files include: LeadTools; FME from Safe Software; Mathematica; Maple. File conversion packages that can read XLS files but not write them include: Aspose Cells (including a free online viewer at https://products.aspose.app/cells/viewers). A number of other file conversion utilities that claim to convert XLS files to XLSX include: Zamzar; xlsgen; and docspal.

    Licensing and patents

Covered by Microsoft's Open Specification Promise, whereby Microsoft "irrevocably promises" not to assert any claims against those making, using, and selling conforming implementations of any specification covered by the promise (so long as those accepting the promise refrain from suing Microsoft for patent infringement in relation to Microsoft's implementation of the covered specification).

New features introduced into XLS may be subject to patent protection. However, Microsoft's interoperability principles indicate "Microsoft will also make available a list of any of its patents that cover any extensions, and will make available patent licenses on reasonable and non-discriminatory terms." As of November 2019, the patent map tool provided by Microsoft indicates that there no patents of concern to users of the [MS-XLS] specification or the [MS-CFB] specification on which it is based.

Transparency The XLS formats (all BIFF versions) are not easily interpreted with basic tools. This is due to techniques used to keep files small and fast to load, such as the use of numeric codes to identify record types, and having the length of each record declared in the file, rather than using fixed-length records or recognizable delimiters for records.
Self-documentation

Options for storing document-level metadata in an XLS file are described in a subsidiary specification, [MS-OSHARED]. An XLS file should include a Summary Information property set, which can include the following optional descriptive properties: Title, Author, Subject (description), Keywords, Comments. See 2.3.3.2.1.1 PIDSI. An XLS file may contain an additional property set (known as Document Summary Information) with a fixed set of properties, including Manager and Company names. User-defined or custom properties are also allowed. See also 2.3.3 Property Set Storage.

The [MS-XLS] specification offers no support for embedding metadata in an externally defined schema.

External dependencies An XLS workbook may pull in data from external data sources, for example, by querying a remote database.
Technical protection considerations Since XLS workbooks can contain sensitive information that needs to be protected, XLS files can be protected by encryption that requires a password to decrypt. Several encryption approaches are supported for password protection, as described at 1.3.3 Encryption, within [MS-OFFCRYPTO]. This clause indicates that the XLS format may use XOR obfuscation, 40-bit RC4 encryption, or CryptoAPI RC4 encryption. See 2.2.10 Encryption (Password to Open) from [MS-XLS] for details of which streams in an XLS file are encrypted.

Quality and functionality factors Explanation of format description terms

Text
Normal rendering

No specific set of factors for assessing quality and functionality of a spreadsheet format has been developed. Since some spreadsheets have a printable or viewable report as a primary function and others are primarily containers for tabular data, selected factors for assessing formats for text and datasets are relevant. In general, functionality supported in an XLS file is similar to that supported in an XLSX file, except that features added to Excel since 2007, as documented in [MS-XLSX]: Excel (.xlsx) Extensions to the Office Open XML SpreadsheetML File Format may not be supported.

BIFF8 was the first encoding for the XLS format that supported Unicode, stored as UTF-16LE (i.e. as UTF-16 in little-endian byte order). Character sets for prior versions of BIFF were based on Windows "code pages."

Dataset
Normal functionality

No specific set of factors for assessing quality and functionality of a spreadsheet format has been developed. Once loaded into a spreadsheet application that supports the XLS format, the functionality of a spreadsheet in the XLS format is expected to be identical to that of the XLSX/OOXML format.

The maximum number of rows for an XLS file is 65535. The maximum number of columns is 255. The maximum dimensions of an XLSX file are 1048576 rows and 16384 columns. New features added to Microsoft Excel since 2007 are not necessarily supported in the XLS format. Examples include Timeline Slicers (introduced in 2013) and Excel data types for Stocks and Geography (introduced in 2019).

Support for software interfaces (APIs, etc.) Microsoft has provided tools that allow developers to work with XLS spreadsheets programmatically, including Visual Basic for Applications (VBA) and COM (component object model) Automation. For a brief introduction, see Understanding automation from Microsoft. See also Why are the Microsoft Office file formats so complicated? (And some workarounds), a post from 2008 on the Joel On Software blog.
Data documentation (quality, provenance, etc.) The XLS formats have no specific support for embedding rich discipline-specific metadata or codebooks. See Self-documentation in Sustainability Factors above.
Beyond normal functionality See XLSX/OOXML.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension xls
Documented in the specification and elsewhere by Microsoft in many locations, for example at Office 2007 File Format MIME Types for HTTP Content Streaming.
Internet Media Type application/vnd.ms-excel
Documented by Microsoft at Office 2007 File Format MIME Types for HTTP Content Streaming. Also listed at IANA. See 1996 registration at https://www.iana.org/assignments/media-types/application/vnd.ms-excel.
Magic numbers Hex: D0 CF 11 E0 A1 B1 1A E1
Documented in the CFB specification, in 2.2 Compound File Header. Applies to all files in CFB format; see GCK'S File Signatures Table entry for Compound Binary File format (aka OLECF).
File signature Hex: 3E 00 03 00 FE FF 09 00
At byte offset 24 from beginning of file. Indicates CFB (Compound File Binary format) major version 3, minor version 3e. Assumes that all .XLS files in BIFF8 format, use this version of CFB. Comments welcome.
File signature Hex 0908....00060500
At byte offset 512 from beginning of file, according to PRONOM records for fmt/61 and fmt/62. This is the beginning of the first BIFF stream in the CFB container. This assumes the first BIFF stream is the Workbook Globals Substream. According to [MS-XLS] 2.1.7.20.3 Globals Substream, "There MUST be exactly one Globals Substream in a Workbook Stream ... , and the Globals Substream MUST be the first substream in the Workbook Stream."
Pronom PUID fmt/61
fmt/62
PRONOM has a number of entries for Microsoft spreadsheet format variants with the .xls extension. The PRONOM entries that appear to correspond to the scope of this format description are http://www.nationalarchives.gov.uk/PRONOM/fmt/61 and http://www.nationalarchives.gov.uk/PRONOM/fmt/62.
Wikidata Title ID Q28858068
See https://www.wikidata.org/wiki/Q28858068 for Binary Interchange File Format, version 8 (aka BIFF8)

Notes Explanation of format description terms

General

Versions of BIFF, the "binary interchange file format," were used as the default formats for Excel spreadsheets, with the file extension .xls, until superseded in Excel 2007 by the XML-based XLSX/OOXML. Excel spreadsheet files with the .xls extension are not all in a single format. For versions of Excel through Excel 4.0 (1992), spreadsheet files could contain only a single worksheet; those early XLS files consisted of a single BIFF stream (through BIFF4). About the .xls binary format, a description from FreeXML, an open-source library to extract data from an XLS spreadsheet, indicates, "There is no .xls file format. It's really a common file suffix applied to many different things." The FreeXML description has a table relating the different versions of BIFF to versions of Excel. Useful details of the early formats are in OpenOffice.org's Documentation of the Microsoft Excel File Format, covering Excel versions 2, 3, 4, 5, 95, 97, 2000, XP, and 2003. Microsoft's own 2007 documentation for the .xls format is available as Microsoft Office Excel 97-2007 Binary File Format Specification [*.xls], which covers BIFF documentation for Excel versions 5, 95, 97, 2000, 2002, 2003, and 2007.

The early XLS formats, used for Excel 2.0 (1987) through Excel 4.0 (1992), allowed only a single worksheet. The corresponding file formats were single BIFF streams.

  • BIFF2 for Excel 2.0 (1987)
  • BIFF3 for Excel 3.0 (1990)
  • BIFF4 for Excel 4.0 (1992)

Note that the extension .xlw was used to support multi-sheet "workspaces" starting with Excel 3.0. However, the .xlw file did not contain user data; it was used to configure Excel's user interface presentation of the component sheets. See .xlw File Extension | ReviverSoft for more detail.

With BIFF5, a new structure was introduced; an XLS file now represented a single Workbook with one or many individual Worksheets. A number of streams, including BIFF streams for individual worksheets, are stored in a Microsoft Compound File Binary File container. The compilers of this resource have experimented with saving spreadsheets as XLS files in current versions of Excel. In all cases, the resulting file was in version 3 of CFB (MS-CFB3). Comments welcome. In BIFF7 and earlier, a record in a BIFF stream has a length limit of 2,084 bytes, including the record type and record length fields.

  • BIFF5 for Excel 5.0 (1993) and Excel 95 (1995)
  • BIFF7 was for an option available in Excel 97. Microsoft's 2007 Specification for Microsoft Office Excel 97-2007 Binary File Format (*.xls) states, "For improved backward compatibility, Excel 97 has a save file type option: Microsoft Excel 97 & 5.0/95 Workbook. When a workbook is saved using this file type, Excel writes two complete book streams. The first stream in the file is the Microsoft Excel 5.0/95 format (BIFF5/BIFF7), and the second one is the Microsoft Excel 97 format (BIFF8). The DSF record, which only appears in the BIFF8 stream, indicates the file is a double stream file. To distinguish the two streams, the BIFF5/BIFF7 stream is called Book, and the BIFF8 stream is called Workbook."
  • BIFF8 for Excel 98 (1998) through Excel 2003. In BIFF8, a BIFF record has a length limit of 8,228 bytes, including the record type and record length fields. As noted above, Excel 97 introduced the BIFF8 format in its double-stream file option. This format description is primarily for the BIFF8 variant of the XLS format.

Note that BIFF12 is used in a different binary file format, using a different container file and the file extension .xlsb. It has been available as an alternative to the XML-based XLSX since Excel 2007. See MS-XLSB.

Detecting the BIFF version in an XLS file: A BoF record, identified by the record type byte (byte 1) with value of Hex 09, marks the beginning of a Book or Workbook stream in a BIFF file. For BIFF2 through BIFF4, the file consists of a single stream and the BIFF version is found from the high-order byte (byte 0) of the record number field in the BoF record that begins the file. The values that identify these BIFF versions are: Hex 00 for BIFF2; Hex 02 for BIFF3; Hex 04 for BIFF4.

For BIFF5, BIFF7 and BIFF8, the BIFF version is not identified so close to the beginning of the file, because the start of the file identifies the file as a CFB container. According to OpenOffice.org's Documentation of the Microsoft Excel File Format, the BIFF version can be identified in the BIFF stream that represents the Workbook Globals Substream. Within this BIFF stream, the two-byte vers field at offset 4 identifies the BIFF version. This field is Hex 0500 for BIFF5 or BIFF7 and Hex 0600 for BIFF8.

History The BIFF8 version of Microsoft's XLS format (MS-XLS) was introduced in 1997 and was the default file format for Microsoft Excel Workbooks through Excel 2003. See General notes, immediately above, for information on chronological versions of BIFF. Starting with Excel 2007, the default format for Excel Workbooks has been XLSX/OOXML.

Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 12/08/2020