Sustainability of Digital Formats: Planning for Library of Congress Collections |
|
![]() |
|
Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact |
Full name | Microsoft Excel Binary (XLSB) File Format (option in Excel 2007 and later ) |
---|---|
Description |
The Microsoft Excel Binary File format, with the .xlsb extension and referred to here as XLSB, was introduced in Microsoft Office 2007, at the same time as the XML-based formats of the Office Open XML (OOXML) family. Unlike that family of formats, XLSB is not an international standard. It is a proprietary Microsoft format for spreadsheets that has been available as a full-fidelity alternative to the default XLSX format since Excel 2007. It is intended for users who need to load and save large data files as fast as possible. Although proprietary, it is covered by Microsoft's Open Specification Promise and its structure is documented at [MS-XLSB]. The XLSB format was originally sometimes referred to as BIFF12, as in “binary file format for Office 12.” XLSB uses the same Open Packaging Conventions (OPC/OOXML) container used by the Office Open XML formats. OPC is a container based on ZZIP_6_2_0; hence you can open an XLSB file with any ZIP tool to see what’s inside. A diagram that illustrates the structure of an XLSB file as unzipped is shown under the heading "File Format Number 2" in All About File Formats, a blog post from Microsoft in July 2006, when Excel 2007 was in beta test. "File Format Number 1" in the same post shows the equivalent XSLX file. Unlike the XLSX example, in which all the parts within the package are encoded in XML, the XLSB file has many parts with .bin extensions. Only a few supporting parts are in XML; the data and formulas are stored in a binary form that is closer to the form needed in memory. For more technical detail on the structure of the XLSB format, see Notes below. The XLSB format supports macros. This is in contrast to the XLSX format; in the OOXML family of XML-based formats, the .xlsm extension must be used for spreadsheets with macros. The specification for XLSB is frequently updated. With each new feature in Excel, specifications for the corresponding markup have been added to the [MS-XLSB] documentation, with changes documented. See, for example, the Change Tracking section in [MS-XLSB]-120120, which includes changes associated with new features for timelines and data "slicers." See Excel 2010 & 2013 Data Slicers Taking Pivot Tables To A New Level Of Control for information about the new features. As of late 2019, the most recent major change is to support Dynamic Arrays, a feature that appears to have been introduced first in Google Sheets. See Preview of Dynamic Arrays in Excel. The XLSB format is supported in very few non-Microsoft products. As of November 2019, Microsoft's File Format Reference for Word, Excel, and PowerPoint states of XLSB, "this is not an XML file format and is therefore not optimal for accessing and manipulating content without using Excel 2019, Excel 2016, Excel 2013, Excel 2010 or Excel 2007 and the object model." Since XLSB is not part of the XML-based Office Open XML family of formats, it is not supported for programmatic manipulation in the open-source Open XML SDK 2.5 for Office. |
Production phase | Can be used in any production phase: for creating documents (initial state): for editing and review (middle-state); and for final distribution. |
Relationship to other formats | |
Subtype of | OPC/OOXML_2012, Open Packaging Conventions (Office Open XML), ISO 29500-2:2008-2012 |
Subtype of | ZIP_6_2_0, ZIP File Format, Version 6.2.0 (PKWARE). Various features of the ZIP File Format are not permitted in OPC. Details on the use of ZIP in OPC are in section 10 and Annex C of ISO/IEC 29500-2:2012. |
Affinity to | XLSX/OOXML_2012, XLSX Transitional (Office Open XML), ISO 29500:2008-2016, ECMA-376, Editions 1-5. An XSLB file without macros can be created from or converted to an XLSX file with full fidelity. |
LC experience or existing holdings | Library staff creating spreadsheets as part of their duties typically use the XLSX format. As of late 2020, the Library of Congress had 556 files with the .xlsb extension in its digital collections, with a total size of over 7 gigabytes. |
---|---|
LC preference | For works acquired for its collections, the list of Library of Congress Recommended Formats Statement for Datasets includes XLSX (.xlsx) as a preferred format for datasets. The binary (.xls) format is also listed as preferred. XLSB is not listed as either acceptable or preferred. |
Disclosure | The Microsoft Excel Binary XLSB file format is proprietary but openly documented and covered by Microsoft's Open Specification Promise. |
---|---|
Documentation | The specification is available at [MS-XLSB]: Excel (.xlsb) Binary File Format. Note that the specification is updated frequently, with many revisions described by Microsoft as "major." |
Adoption |
Not widely supported except in Microsoft products. The primary advantage of XLSB over XLSX is for faster loading and saving of very large spreadsheets. Some users (individual or institutional) may choose to make XLSB the default format to use for saving spreadsheet workbooks. The compilers of this resource have not been able to determine how widely the XLSB format is used. Comments welcome. Among the commercial products that support the format are some software libraries aimed at developers. Xlsgen describes itself as "a software component which reads, writes, calculates, renders and print any Excel 97 / 2000 / XP / 2003 / 2007 / 2010 / 2013 / 2016 spreadsheets." Among the languages that can be used with xlsgen are VB, VBScript, VBA, VB.NET, C#, C, C++, Java, Delphi, Perl, and Python. Aspose Cells is a family of spreadsheet programming libraries for various programming languages, including Java, C++, and environments, including .NET and Android. The Aspose libraries support XLSB for reading, writing, rendering, and printing. OpenOffice and LibreOffice, free user applications based on open-source code, can read files in XLSB format but cannot write them. The compilers of this resource have not determined whether these open-source applications can handle features added to the XLSB format since its initial release. Comments welcome |
Licensing and patents |
Covered by Microsoft's Open Specification Promise, whereby Microsoft "irrevocably promises" not to assert any claims against those making, using, and selling conforming implementations of any specification covered by the promise (so long as those accepting the promise refrain from suing Microsoft for patent infringement in relation to Microsoft's implementation of the covered specification). |
Transparency | Although it is possible to look inside the ZIP-based wrapper and some parts of the package are in XML, such as the metadata in core.xml, the .bin files that contain the main worksheet content cannot be interpreted or used without spreadsheet software that supports the format. |
Self-documentation |
The File Properties features of XLSB build on the features supported for XLSX/OOXML. Limited capabilities for descriptive metadata for XLSB and XLSX files as a whole are supported as Core Properties as specified in OPC/OOXML. Beyond the Core Properties, the XLSB formats use the Custom File Properties and Extended File Properties features of Part 1 of the OOXML family of specifications (ISO/IEC 29500 and ECMA-376) to store names for component worksheets and for named ranges in worksheets.] See Useful References below for links to the relevant normative references for the XLSB specification. |
External dependencies | An XLSB workbook may pull in data from external data sources, such as databases. |
Technical protection considerations | An XLSB workbook can contain sensitive information that needs to be protected. A file can be protected by encrypting it using a password. Once a file is encrypted, the data can only be accessed by decrypting the file using a password. |
Dataset | |
---|---|
Normal functionality | No specific set of factors for assessing quality and functionality of a spreadsheet format has been developed. Once loaded into a spreadsheet application that supports the XLSB format, the functionality of a spreadsheet in the XLSB format is expected to be identical to that of the XLSX/OOXML format. |
Beyond normal functionality | The special features of the XLSB format compared to the XLSX format are: files size for storage and transmission are usually smaller for an XLSB file; time to load into an application is typically less for an XLSB file. |
Tag | Value | Note |
---|---|---|
Filename extension | xlsb |
|
Internet Media Type | application/vnd.ms-excel.sheet.binary.macroEnabled.12 |
Documented by Microsoft at Office 2007 File Format MIME Types for HTTP Content Streaming. Also listed at IANA. See 2011 registration at https://www.iana.org/assignments/media-types/application/vnd.ms-excel.sheet.binary.macroEnabled.12. |
File signature | See note. | There is no immediately accessible string pattern for identifying an XLSB file which uses a container of the format OPC/OOXML_2012, which is a constrained implementation of ZIP_6_2_0. |
Other | Target="xl/workbook.bin" |
This signifier assumes the usual name of the main part of an XLSB file. The target declaration will occur in the top-level Relationships part (\_rels\.rels part in an OPC package of a XLSB file, as an attribute of a <Relationship> element within the <Relationships> element. In an XLSB file, it will be a relationship of type http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument. |
Pronom PUID | fmt/595 |
PRONOM entry for Microsoft Excel Non-XML Binary Workbook, 2007 onwards. See http://www.nationalarchives.gov.uk/PRONOM/fmt/595. |
Wikidata Title ID | Q66759528 |
See https://www.wikidata.org/wiki/Q66759528. |
General |
The XLSB workbook data is contained in a ZIP package conforming to the Open Packaging Conventions (OPC/OOXML). Individual files stored in the ZIP package, called "parts" in the specification, contain information about the content of a workbook including workbook data such as worksheet definitions. A few parts store information in XML and other parts may contain information stored as a binary stream of bytes, e.g., for embedded images. But the majority of the workbook structure and content is stored as binary "records." Each binary record contains its record type (as an integer code), record length (in bytes), and zero or more type-specific fields depending on its record type. This deceptively simple structure hides the complexity of the structure of spreadsheet data. As of 2019, the [MS-XLSB] specification includes 844 record types. Shortly after the release of Excel 2007 in beta test, Stephane Rodriguez did some reverse engineering of the binary format. His analysis in Office 2007 .bin file format complements the original specification for XLSB, which was not published until after his reverse-engineering efforts. The details that follow are derived from the [MS-XLSB] specification and the analysis by Rodriguez. An XLSB file created by Excel contains exactly one Workbook part, workbook.bin. The Workbook part refers to Worksheet parts (e.g. worksheet1.bin, worksheet2.bin). The specification states, "Unless otherwise specified, all data in files of the type specified by this document are stored in little-endian format." See Wikipedia entry on endianness. Numeric values in XSLB files are stored according to the IEEE Standard for Binary Floating-Point Arithmetic", IEEE 754-1985, October 1985. See Wikipedia entry for IEEE 754. Text strings are stored as arrays of 16-bit Unicode characters prefixed by an integer value for the length of the string. Formulas are stored in a compact tokenized representation known as a "parsed expression" containing a sequence of parse tokens. The tokens represent operators, operands, functions, etc. For example, each function supported in Excel is identified by an unsigned integer, which is used as the token. Tokens are ordered according to Reverse Polish notation, a commonly used logical system for the specification of mathematical formulas in an order that corresponds to the order required for evaluation. A parsed expression is converted into a textual formula at runtime for display and user editing. [Note: The term "formula bytecode" has also been used for the representation termed "parsed expression" in the XLSB specification; Rodriguez uses this term in his analysis.] |
---|---|
History |
The XLSB format was introduced by Microsoft in 2006 for Office 2007 as one of the formats in which an Excel spreadsheet could be saved. Office 2007 was also known as Office 12 and the XLSB format was often referred to as BIFF12 at that time. The original documentation for the format was released as [MS-XLSB] in 2007. With each new feature in Excel, specifications for the corresponding markup, such as new elements and attributes, have been added to the [MS-XLSB] documentation, with changes from the previous documented in a clause titled Change Tracking. |
|