|Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact|
|Full name||XLSX, (Office Open XML, Spreadsheet ML) ISO 29500:2008-2016, also ECMA-376, Editions 1-5.|
The Open Office XML-based spreadsheet format using .xlsx as a file extension has been the default format produced for new documents by versions of Microsoft Excel since Excel 2007. The format was designed to be equivalent to the binary .xls format produced by earlier versions of Microsoft Excel. For convenience, this format description uses XLSX to identify the corresponding format. The primary content of a XLSX file is marked up in SpreadsheetML, which is specified in parts 1 and 4 of ISO/IEC 29500, Information technology -- Document description and processing languages -- Office Open XML File Formats (OOXML). This description focuses on the specification in ISO/IEC 29500:2012 and represents the format variant known as "Transitional." Although editions of ISO 29500 were published in 2008, 2011, 2012, and 2016, the specification has had very few changes other than clarifications and corrections to match actual usage in documents since SpreadsheetML was first standardized in ECMA-376, Part 1 in 2006. This description can be read as applying to all SpreadsheetML versions published by ECMA International and by ISO/IEC through 2016. See Notes below for more detail on the chronological versions and differences.
The XLSX format uses the SpreadsheetML markup language and schema to represent a spreadsheet "document." Conceptually, using the terminology of the Spreadsheet ML specification in ISO/IEC 29500-1, the document comprises one or more worksheets in a workbook. A worksheet typically consists of a rectangular grid of cells. Each cell can contain a value or a formula, which will be used to calculate a value, with a cached value usually stored pending the next recalculation. A single spreadsheet document may serve several purposes: as a container for data values; as program code (based on the formulas in cells) to perform analyses on those values; and as one or more formatted reports (including charts) of the analyses. Beyond basics, spreadsheet applications have introduced support for more advanced features over time. These include mechanisms to extract data dynamically from external sources, to support collaborative work, and to perform an increasing number of functions that would have required a database application in the past, such as sorting and filtering of entries in a table to display a temporary subset. The markup specification must support both basic and more advanced functionalities in a structure that supports the robust performance expected by users.
An XLSX file is packaged using the Open Packaging Conventions (OPC/OOXML_2012, itself based on ZIP_6_2_0). The package can be explored, by opening with ZIP software, typically by changing the file extension to .zip. The top level of a minimal package will typically have three folders (_rels, docProps, and xl) and one file part ([Content_Types].xml). The xl folder holds the primary content of the document including the file part workbook.xml and a worksheets folder containing a file for each worksheet, as well as other files and folders that support functionality (such as controlling calculation order) and presentation (such as formatting styles for cells) for the spreadsheet. Any embedded graphics are also stored in the xl folder as additional parts. The other folders and parts at the top level of the package support efficient navigation and manipulation of the package:
The standards documents that specify this format run to over six thousand pages. Useful introductions to the XLSX format can be found at:
|Production phase||Can be used in any production phase: for creating documents (initial state): for editing and review (middle-state); and for final distribution.|
|Relationship to other formats|
|Subtype of||OOXML_Family, OOXML (ISO/IEC 29500) Format Family|
|Subtype of||OPC/OOXML_2012, Open Packaging Conventions (Office Open XML), ISO 29500-2:2008-2012|
|May contain||MCE/OOXML_2012, Markup Compatibility and Extensibility (Office Open XML), ISO 29500-3:2008-2015, ECMA-376, Editions 1-4|
|Has modified version||XLSX/OOXML_Strict_2012, XLSX Strict (Office Open XML), ISO 29500-1: 2008-2016. The Strict variant of XLSX disallows legacy markup specified only in Part 4 of ISO/IEC 29500. The Strict variant has less support for backwards compatibility when converting documents from older formats.|
|Has modified version||Associated template format using extension .xltx, not described separately on this website. A .xltx template file is a SpreadsheetML document based on the same schema and namespaces (specified in ISO/IEC 29500) as a .xlsx file. The difference is its intended use.|
|Affinity to||Associated formats for SpreadsheetML documents or templates with embedded macros, using file extensions .xlsm and .xltm, not described separately at this website. The language used by Microsoft for macros, VBA, is not covered by the ISO/IEC 29500 specification, but is fully documented by Microsoft. Macros are embedded as separate parts in the OPC package. Macros are widely used in spreadsheets used in corporate settings.|
|Affinity to||A hybrid variant, XLSB, not described separately on this site. XLSB uses the OPC package structure, but stores the spreadsheet data in binary form. This variant was introduced for performance (and space-saving) reasons for very large spreadsheets, particularly in relation to loading. It is not part of the ISO-29500 or ECMA 376 standard and largely unsupported except in Microsoft Excel. Although important when the XLSX format was first introduced, more powerful processors and multi-threaded applications may mean that its use will decline. Comments welcome.|
|Defined via||XML, Extensible Markup Language (XML)|
|LC experience or existing holdings||Used by Library of Congress staff.|
|LC preference||For works acquired for its collections, the list of Library of Congress Recommended Formats Statement for Datasets/Databases, as of June 2016, includes XLSX (.xlsx) as a preferred format for datasets. The binary (.xls) format is also listed as a preferred format.|
|Disclosure||International open standard. Maintained by ISO/IEC JTC1 SC34/WG4. Originated by Microsoft Corporation and first standardized through ECMA International in 2006. Approval as ISO/IEC 29500 was in 2008.|
ISO/IEC 29500-1, Information technology -- Document description and processing languages -- Office Open XML File Formats -- Part 1: Fundamentals and Markup Language Reference and ISO/IEC 29500-4, Information technology -- Document description and processing languages -- Office Open XML File Formats -- Part 4: Transitional Migration Features. Latest version (dated 2016 as of February 2017) is available from ISO/IEC Publicly Available Standards.
The Transitional variant of XLSX is specified by applying the differences described in Part 4 (Transitional Migration Features) to the specification in Part 1. Part 4 cannot be read without detailed reference to subclauses in Part 1.
Very widely used. XLSX was originally developed by Microsoft as an XML-based format to replace the proprietary binary format that uses the .xls file extension. Since Word 2007, XLSX has been the default format for the Save operation. Although the market share for the Microsoft Office productivity suite is declining, in the enterprise arena, it was still 90% in 2012, according to Gartner, as reported by CNN Money in Nov 2013.
Microsoft Excel (and hence the XLSX format) is extremely widely used in corporate settings. Recent versions have introduced increasingly powerful capabilities for drawing data dynamically from other sources, and performing sophisticated analyses. See, for example, Is Excel the Next Killer BI (Business Intelligence) App?, a June 2014 post in the SQL Server BI blog. The software supports add-ins and APIs for data import and there is an associated industry of consultants and data suppliers. For example, the Federal Reserve Bank of St. Louis Economic Data (FRED) add-in supports dynamic extraction of macroeconomic data. Most of the new application features introduced in Excel 2010 and 2013 are designed to support new options for data extraction and more powerful analyses. In the context of large-scale corporate management and the financial sector, the dominance of Excel and of the XLSX format for spreadsheets look likely to continue.
As of late 2014, competition has been active between Google and Microsoft for the market for office suites on mobile devices. Both players now support direct editing of the XLSX format for spreadsheets through free apps. A Google Drive blog post from June 25, 2014 announced that Google Apps for Android could now edit Office files natively, without format conversion and that the same capability is available online when using the Chrome browser. The first free Microsoft apps for the iPad had only supported viewing of OOXML files; creation or editing required an Office 365 subscription. However, in November 2014, Microsoft announced that updated Office apps for iPad would support creating and editing of OOXML files. Versions for Android followed in January 2015. See System requirements for Office: Mobile devices and Wikipedia article on Microsoft Office Mobile Apps.
Wikipedia's Office Open XML: Application Support and List of software that supports Office Open XML document support in a wide variety of word-processing applications and file conversion software, including the open source LibreOffice (Read and Write support) and Apache OpenOffice (Read support). In June 2014, Microsoft released its Open XML SDK (first released for use in 2007), as open source.
The corpus of existing documents on the web is still dominated by the binary .xls format. A Google search in November 2014 of the U.S. web by filetype yielded: .xls, 7,800,000; .xlsx, 1,570,000; .ods, 53,700. A comparison between .xls and .xlsx for files newly indexed in the past year showed roughly equal numbers of .xls and .xlsx files: .xls, 20,000; .xlsx, 18,000. The compilers of this resource acknowledge that searches of the web are not a reliable measure of adoption for spreadsheet file formats at the initial (creation) phase of a content lifecycle. Most spreadsheets are private and those that are made available on the web are likely to be converted to the format considered most likely to be usable by the intended audience.
XLSX and its predecessor binary .xls format appear relatively frequently on lists of acceptable formats for archiving of data. In this context, the assumption is usually that the data per se is stored in a worksheet as a rectangular grid with columns representing variables/measurements and rows representing columns. Note that recommended practice for archiving datasets always calls for a "codebook" or other documentation that explains both the scope and context for the data's collection and descriptions for each variable, but does not expect such metadata to be in the same file as the data. For example, see recommendations from the Library of Congress, UK Data Archive, U.S. National Archives (NARA), University of California Curation Center, and the Dryad Digital Repository.
The compilers of this resource are not aware of any spreadsheet applications other than Excel 2013 (or equivalent Excel Online or Excel App) that can create the Strict variant of XLSX (as defined in Part 1 of the ISO/IEC 29500 standard). Tests in February 2017 indicated that Google Sheets and Libre Office both created new documents in the Transitional variant described in this document, as indicated by the namespace declarations, even when the document includes no elements or attributes not present in the Strict versions of the schemas. This corresponds to the default behavior of Microsoft Excel.
|Licensing and patents||
The specification originated from Microsoft Corporation. Current and future versions of ISO/IEC 29500 and ECMA-376 are covered by Microsoft's Open Specification Promise, whereby Microsoft "irrevocably promises" not to assert any claims against those making, using, and selling conforming implementations of any specification covered by the promise (so long as those accepting the promise refrain from suing Microsoft for patent infringement in relation to Microsoft's implementation of the covered specification).
Features introduced into XLSXX through the MCE mechanism may be subject to patent protection. However, Microsoft's interoperability principles indicate "Microsoft will also make available a list of any of its patents that cover any extensions, and will make available patent licenses on reasonable and non-discriminatory terms."
The structure and text of an XLSX file are all represented in XML and hence viewable without special tools. XML-aware tools that can show and parse the element hierarchy make viewing and interpretation more convenient. The most commonly used parts, elements, and attributes have names that will be quickly recognizable to a human reader. For example, the element <c> defines the content of a cell, the element <f> holds a formula, and <v> holds a value. The syntax of formulas is relatively intuitive and built-in functions have meaningful names. Both are documented in subclause §18.17 of ISO/IEC 29500-1. Simple documents can be interpreted with very basic tools. However, interpreting the semantics of many elements and the correspondence of elements and attributes to spreadsheet application functionality will require understanding of both the schema and the textual specification. The specification provides a primer and valuable examples, for example of the use of Styles to control cell formatting and of Pivot Tables. Additionally, not all normative constraints for XLSX can be represented fully in the W3C XML Schema Language (XML_Schema_1_0).
The organization of parts in the XLSX package may be unintuitive; for example, the textual value for a cell may be stored in a separate part, usually called /xl/sharedStrings.xml, rather than in the cell, which instead contains a reference to an entry in that part. This technique allows a frequently used text value to be stored once and referred to many times.
For transparency of the package containing the constituent parts of the XLSX file, see OPC/OOXML_2012.
The property file /docProps/core.xml is usually present for OPC packages, although all elements in this Core Properties part are optional. For more on self-documentation of the package containing the constituent parts of the XLSX file, see OPC/OOXML_2012.
A single optional part with a pre-defined set of extended properties for the package is permitted. Microsoft uses the part name /docProps/app.xml for this and it is always present in XLSX files created by Microsoft. The extended properties (each optional and non-repeatable) are primarily administrative and, apart from Company are not related to the intellectual nature of the document or the context for its creation or use. Elements include: name of creating application; version of creating application; document security level; and information about embedded hyperlinks. LibreOffice uses the same part names for the core and extended properties parts, but the extended properties part typically records fewer properties. LibreOffice does identify itself as the creating application for non-empty documents. In November 2014, a newly created XLSX file downloaded from GoogleSheets did not contain any properties parts.
The nature of the OPC package would permit the addition of a part that included rich XML-based metadata, preferably in a well-known schema, and that was listed in the relationships file associated with the Core Properties part with an appropriate relationship type. However, no part of ISO/IEC 29500 predefines such a relationship. Embedding such a part in an OPC package could be done without affecting the primary document content. An example of embedding an ONIX metadata record in an OOXML file is given in ISO/IEC TR 30114-1:2016 Information technology — Extensions of Office Open XML file formats — Part 1: Guidelines, in Clause 5.4 Embedding foreign Open Packaging Convention (OPC) parts.
|Technical protection considerations||
No specific set of factors for assessing quality and functionality of a spreadsheet format has been developed. This format description uses selected factors for assessing formats for text and datasets.
Some spreadsheets have a printable or viewable report as a primary function. Textual content in cells in XLSX worksheets is conveniently extractable for quotation and for indexing. Full support for Unicode.
|Integrity of document structure||The semantic structure of formulas and their relationship to cells with values is fully represented. Rectangular areas within a worksheet can be identified as tables, with labels for rows and columns.|
|Integrity of layout and display||Excellent support for layout choices. Represents entire layout and formatting as intended by an author who used an application for which XLSX is a native format. Differences in detail can occur on display if the original fonts used are not available in the system used for viewing.|
|Support for mathematics, formulae, etc.||TBD|
|Functionality beyond normal rendering||As a format designed for creating and editing a spreadsheet, XLSX stores information associated with the process of creation and review of spreadsheets, such as comments by multiple authors. Also supported are embedded media objects in binary formats, and links to external media objects, such as images, audio, or video. Note that external objects may be referred to as local files using relative paths or by URIs (or IRIs).|
XLSX does not support strict data-typing as typically supported in database applications and programming languages, e.g, to distinguish integers from floating point numbers, or currency values from other numbers. The display of stored numbers as integers or currency is through display format options.
Any computing system has limits on precision that can be used in calculations. XLSX defines limits on numerical precision in Part 1 §188.8.131.52, basing it on the binary64 double-precision defined in ISO/IEC/IEEE 60559:2011 for floating-point arithmetic. LibreOffice Calc and Microsoft's .xls format use the same precision. In practical terms this means that the precision limit is about 15 significant decimal digits. This is insufficient for some forms of statistical analysis (see references below). Additionally, but of less significance in practice, the upper limit for numeric values is 9.99999999999999E+307 for positive numbers and -9.99999999999999E+307 for negative numbers . This is approximately the same as 1 or -1 followed by 308 zeros.
|Support for software interfaces (APIs, etc.)||
There is no API specifically aimed at the use of XLSX to hold a dataset. See Notes on Relationship to CSV below for discussion of widely used support for extracting data from a spreadsheet for use in statistical software. The OpenXML SDK can be used for importing or exporting data programmatically, but has no built-in functionality that understands the semantics of observations and measured variables.
|Data documentation (quality, provenance, etc.)||
XLSX and its OPC package have no specific support for rich discipline-specific metadata or codebooks. See Self-documentation in Sustainability Factors above.
|Beyond normal functionality||
An XLSX spreadsheet document can hold not only raw data, but also formulas that perform calculations on that data and present results as numbers or in graphical form.
|Internet Media Type||application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
||From IANA registration.|
|XML namespace declaration||http://schemas.openxmlformats.org/spreadsheetml/2006/main
||This namespace declaration is for the Transitional variant of XLSX. It occurs in the mandatory Main Document part of a XLSX file (package), which usually has the name /xl/workbook.xml. The use of /xl/workbook.xml as the name of the main part is conventional, rather than mandated in ISO 29500.|
||This signifier assumes the usual name of the main part of an XLSX file. The target declaration will occur in the top-level Relationships part (\_rels\.rels part in an OPC package of a Transitional XLSX file, as an attribute of a <Relationship> element within the Relationships element. It will be the target of a relationship of type http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument. See root namespace and source relationship for Main Document Part in ISO/IEC 29500-4:2012, §10.1.23, which refers to ISO/IEC 29500-1:2012, §12.3.23.|
|Wikidata Title ID||Q26207808
||Office Open XML Spreadsheet Document, Transitional, ISO/IEC 29500:2012. See https://www.wikidata.org/wiki/Q26207808|
|Wikidata Title ID||Q26207734
||Office Open XML Spreadsheet Document, Transitional, ISO/IEC 29500:2011. See https://www.wikidata.org/wiki/Q26207734|
|Wikidata Title ID||QQ26205771
||Office Open XML Spreadsheet Document, Transitional, ISO/IEC 29500:2008. See https://www.wikidata.org/wiki/Q26205771|
|Wikidata Title ID||Q26211528
||Office Open XML Soeadsheet Document, Transitional, ISO/IEC 29500:2012, with Microsoft extensions. See https://www.wikidata.org/wiki/Q26211528|
|Wikidata Title ID||Q26211338
||Office Open XML Spreadsheet Document, Transitional, ISO/IEC 29500:2011, with Microsoft extensions. See https://www.wikidata.org/wiki/Q26211338|
|Wikidata Title ID||Q26207986
||Office Open XML Spreadsheet Document, Transitional, ISO/IEC 29500:2008, with Microsoft extensions. See https://www.wikidata.org/wiki/Q26207986.|
This description uses filenames (e.g., core.xml) that are used by most, if not all, implementations. As parts are defined by their content type in the mandatory [Content_Types].xml file part, use of these names is conventional rather than mandatory.
Relationship between XLSX and binary .xls format: Conversion from the binary .xls format to XLSX using the Save As operation in Microsoft Excel is designed to have 100 percent fidelity. For Excel 2007, the formats should be equivalent. Features added since Excel 2007 will usually not be supported in the binary format; when converting from XLSX to .xls, later versions of Excel will attempt to "down-convert" to supported features and will present a compatibility check that indicates which features will be converted or lost.
Relationship to CSV: The CSV format is a simple textual format for rectangular datasets. Rows represent observations and columns represent the variables measured. The first row may hold labels for the variables. Conversion from CSV to XLSX is straightforward and offered as an import feature by most spreadsheet applications. One caveat with import is that CSV imposes no limits to the precision of numeric value that can be represented. If numbers in a CSV file have more than 15 digits, the values will likely be rounded on conversion to XLSX. If an XLSX worksheet consists simply of data values where rows represent observations and columns represent variables, export/conversion to CSV is straightforward and widely offered by spreadsheet applications. If the top row gives variable names it will typically be exported appropriately. However, the exported file does not distinguish between raw data and calculated values. This loss of semantics may be significant in some contexts.
Conversion between XLSX and ODS: Acknowledging the interest in whether conversion between OOXML and OpenDocument Format (including XLSX and ODS (OpenDocument Format spreadsheet) files) could be reliable, ISO started a work item to explore this issue. ISO/IEC TR 29166:2011 Information technology -- Document description and processing languages -- Guidelines for translation between ISO/IEC 26300 and ISO/IEC 29500 document formats is the output of that expert working group. The 2011 report documents the challenges of translation between OOXML and ODF 1.1 formats, including the spreadsheet formats, based on the standards as documented at the time. This report, available from ITTF, describes features and functionality for the three primary types of office document and characterizes the translatability of features and functions as high, medium, or low. The challenges are significant since the two formats use different underlying models. Although simple documents can be effectively converted, a round-trip to an identical document should never be expected. Display differences will be common after conversion, often of no semantic significance, but many resulting in different spacing or formatting (such as borders and shading). Judging from the ISO/IEC TR 29166 report, and Microsoft's documentation, among the features that appear particularly problematic for conversion, and could lead to problems of more substance, particularly if a round trip is desired, are:
Excel 2013 introduced support for ODF 1.2 and the OpenFormula specification incorporated as Part 2. Microsoft has documented some of the differences between ODS and XLSX as related to opening and saving ODS files in various versions of Excel. Similarly, LibreOffice, in its continually updated Feature Comparison: LibreOffice - Microsoft Office (Spreadsheet Applications), highlights conversion problems relating to its support for XLSX. Highlights from these two lists of incompatibilities in 2014 (using the feature comparison LibreOffice Calc 4.3 vs. Microsoft Excel 2013) included:
In February 2017, the comparison between LibreOffice Calc 5.3 and Excel 2016 mentions :
When considering tools for conversion from OOXML to ODF, it is important to understand which version of ODF is the target. Significant extensions to the standard have been made since ODF 1.1, but ODF 1.1 is the only version that has completed the ISO/IEC standardization process as of August 2014, with some amendments and corrections. ODF 1.2 was approved as an ISO standard in 2015. Office 2013 and 2016 for Windows support export to ODF 1.2, but without change tracking. ODF 1.3 is already in the works, and LibreOffice offers the option to Save As "1.2 Extended." See Wikipedia entry for Open Document Format and ODF Implementer Notes from LibreOffice Development wiki. The compilers of this resource believe that some of the amendments and features added in new versions of ODF are expected to improve the fidelity of conversion when supported in conversion tools but have no direct experience. New editions of ISO/IEC 29500 were published in 2011, 2012, and 2016; however, the changes were primarily corrections and clarifications to reflect XLSX documents as produced in practice. Of more relevance in relation to fidelity of conversion is whether a document includes any of the few new features introduced in recent versions of Excel and marked up in the Markup Compatibility and Extensibility namespace (MCE/OOXML_2012). Microsoft has documented these extensions in [MS-XLSX]: Excel (.xlsx) Extensions to the Office Open XML SpreadsheetML File Format. Among the application features that depend on such extensions are: Power Query, Power View, Data Models (a new approach for integrating data from multiple tables, effectively building a relational database inside an Excel workbook), Slicers (a new device for configuring Pivot Tables), and Timelines (a special filter type for Pivot Tables).
In June 2014, Google announced direct editing of XLSX files in the updated Google Sheets app on Android devices or online if using the Chrome browser with an extension Google supplies; the compilers of this resource have found no good information yet on the degree to which files edited in these versions of Google Sheets can be opened satisfactorily in Excel. Other spreadsheet applications, including LibreOffice, Apple's Numbers app, and Google Sheets when using browsers other than Chrome can both import and export/download XLSX files, but round-tripping without loss should not be expected. Apache Open Office can import XLSX files, but some loss should be expected.
The original XLSX specification was published in ECMA-376, Part 1 in 2006. Between then and 2012, the main change to the specification for SpreadsheetML has been the split between Strict (as defined in Part 1) and Transitional (as defined in Part 4 in conjunction with Part 1). Editions of ISO/IEC 29500 and ECMA 376 between 2008 and 2016 related to spreadsheetML have primarily been corrections and clarifications, with a single exception, related to how dates are stored.
Late in the ISO standardization process for OOXML, a proposal was made to adopt the ISO 8601 format for dates and times in spreadsheets. Dates and times in spreadsheets have usually been stored as numbers (sometimes termed "serial date-time" values), which use less space in memory or files, are convenient for common date-based calculations, and easily presented in a user-specified display format (following local conventions and using different scripts). The experts present at the ISO 29500 Ballot Resolution Meeting where votes were held on the outstanding proposals for the OOXML format were primarily experts in XML and in textual documents rather than with spreadsheets (see Why do we need serial dates in the Transitional form of IS 29500?, a 2009 blog post). The details of the proposal as approved had several shortcomings, recognized by spreadsheet experts once ISO 29500:2008 was published and software developers began to build tools. Firstly, no existing applications would be able to recognize and handle dates in the ISO 8601 format if they were included in XLSX Transitional files, as permitted by the published standard. See, for example, Losing data the silent way - ISO8601-dates. Since the intent of the Transitional variant of ISO 29500 was to be compatible with the existing corpus of .xlsx documents and the applications designed to handle them, an amendment to Part 4 to disallow ISO 8601 dates in the Transitional variant was introduced. Secondly, ISO 8601 is a very flexible format, and any use in a context that aims at interoperability needs to be specific about which particular textual string patterns are expected for dates and times. An amendment to specify particular string patterns for use in XLSX Strict, selected from the variety allowed by ISO 8601, was introduced. The associated amendments to Parts 1 and 4 were approved in December 2011 and incorporated into ISO 29500:2012. The changes were almost entirely in the text of the standard, with minimal changes to the schemas for SpreadsheetML, apart from disallowing the date cell-type in XLSX Transitional. The compilers of this resource are not aware of any SpreadsheetML implementations that would have generated XLSX Transitional files with dates in the ISO 8601 textual form that is no longer compliant with ISO 29500-4. Comments welcome.
See also Notes/History for OOXML_Family.