Sustainability of Digital Formats: Planning for Library of Congress Collections |
|
Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact |
Full name | Office Open XML (OOXML, ISO/IEC 29500, ECMA 376) Format Family |
---|---|
Description |
This description is an overview of the family of formats defined by ISO/IEC 29500: Information technology -- Document description and processing languages -- Office Open XML File Formats and the corresponding ECMA 376 specifications. This family of XML-based formats was designed by Microsoft to match the functionality of the proprietary binary formats that had been used as the default formats in Microsoft Office applications (Word, Excel, and PowerPoint) through Office 2003 and be fully compatible with the existing corpus of documents. In December 2005, a Technical Committee of the ECMA standardization organization (TC45) was established to review documentation for the proposed Office Open XML specification submitted by Microsoft. The committee incorporated expertise from large customers using Microsoft Office for enterprise systems, software vendors already developing products to read, write, and transform office documents, and archival institutions with an interest in long-term preservation. The resulting document was approved as ECMA 376 in December 2006 and was then submitted for standardization through ISO/IEC JTC 1 in early 2007. Approval as ISO/IEC 29500 followed in early 2008. ISO/IEC 29500 incorporated many detailed changes and a restructuring of the parts. One important change was to separate the specifications for markup that supports the functional requirements of the three main content categories from the specifications for elements and attributes that support backwards compatibility and legacy formats. Legacy markup was documented in a new part under the title Transitional Migration Features. Files that comply with ISO/IEC 29500 Part 1 are termed "Strict" and files that comply with Part 4 (which is structured as textual modifications to Part 1) are termed "Transitional." The primary members of the OOXML Format Family are document formats for the key office productivity content categories:
Key supporting members of the format family include:
WordprocessingML, SpreadsheetML, and PresentationML are specified in Parts 1 and 4 of ISO/IEC 29500. Part 1 defines the Strict variant of these three formats. Part 4, written as a supplement to Part 1, specifies additional markup to support compatibility with various legacy applications. The Transitional variant of each of these formats allows markup documented in Part 4 in addition to that documented in Part 1. In addition to the three main markup languages (MLs), the standard defines several supporting markup languages and schemas: DrawingML, which includes markup for graphical elements in any of the three main document types, including embedded images, vector graphics for diagrams, and analytical charts derived from data in a document; Office Math Markup Language (OMML), which supports the display of mathematics in the context of applications that support collaborative editing and tracked changes within mathematical expressions; a schema for bibliographies; and several supporting schemas for document properties (core, extended, and custom). Part 4 includes the specification for VML, a deprecated graphics language superseded by DrawingML. Several closely related formats are covered completely or in large part by the ISO/IEC 29500 and ECMA 376 specifications. These include documents used as templates for other documents and macro-enabled variants of the primary content types. See Notes below. |
Production phase | OOXML can be used in any production phase for office documents, as they are created (initial state), exchanged for editing and review (middle-state), and published (final-state). |
Relationship to other formats | |
Subtype of | OPC/OOXML_2012, Open Packaging Conventions (Office Open XML) , ISO/IEC 29500-2:2008-2012 |
Has subtype | DOCX/OOXML_2012, DOCX Transitional (Office Open XML), ISO 29500:2008-2012, ECMA-376. Uses WordprocessingML in an OPC/OOXML_2012 package. |
Has subtype | DOCX/OOXML_Strict_2012, DOCX Strict (Office Open XML), ISO 29500-1:2008-2012, ECMA-376, Editions 2-4. Disallows legacy markup permitted in DOCX Transitional to support backwards compatibility. |
Has subtype | XLSX/OOXML_2012, XLSX Transitional (Office Open XML), ISO 29500:2008-2012, ECMA-376. Uses SpreadsheetML in an OPC/OOXML_2012 package. |
Has subtype | XLSX/OOXML_Strict_2012, XLSX Strict (Office Open XML), ISO 29500-1:2008-2012, ECMA-376, Editions 2-4. Disallows legacy markup permitted in XLSX Transitional to support backwards compatibility. Permits storage of dates in profile of ISO 8601 date and time format. |
Has subtype | PPTX/OOXML_2012, PPTX Transitional (Office Open XML), ISO 29500:2008-2012, ECMA-376. Uses PresentationML in an OPC/OOXML_2012 package. |
Has subtype | PPTX/OOXML_Strict_2012, PPTX Strict (Office Open XML), ISO 29500-1:2008-2012, ECMA-376, Editions 2-4. Disallows legacy markup permitted in PPTX Transitional to support backwards compatibility. |
Has subtype | Other application-specific formats fully defined by OOXML as specified in ISO/IEC 29500. These include .dotx, .potx, and .xltx, template files as used and produced by Microsoft Office products since Office 2007. The template formats are not described separately at this web site; they are essentially identical to the corresponding document formats, which are described as subtypes and linked above. |
Has modified version | Other application-specific formats closely related to OOXML. These include .docm, .pptm, and .xlsm files, macro-enabled formats as produced by Microsoft Office products since Office 2007. The macro-enabled variant formats are not described at this web site at this time. See Notes below for more on Microsoft Office use of macros. |
May contain | MCE/OOXML_2012, Markup Compatibility and Extensibility (Office Open XML), ISO 29500-3:2008-2015 |
Subtype of | ZIP_6_2_0, ZIP File Format, Version 6.2.0 (PKWARE) |
Defined via | XML, Extensible Markup Language (XML) |
LC experience or existing holdings |
The Library of Congress was represented on ECMA TC45 during the initial standardization processes and has continued to be active in the maintenance through SC34/WG4. For other aspects of experience with OOXML, see individual subtypes, particularly DOCX/OOXML_2012, XLSX/OOXML_2012, and PPTX/OOXML_2012. |
---|---|
LC preference |
The Library of Congress Recommended Format Statement (RFS) lists OOXML as an acceptable format for textual works in digital form and electronic serials. See also individual subtypes, particularly DOCX/OOXML_2012, XLSX/OOXML_2012, and PPTX/OOXML_2012. |
Disclosure | Family of formats based on international open standard. Maintained by ISO/IEC JTC1 SC34/WG4. Originated by Microsoft Corporation and first standardized through ECMA International in 2006. Approval as ISO/IEC 29500 was in 2008. |
---|---|
Documentation |
ISO/IEC 29500, Information technology -- Document description and processing languages -- Office Open XML File Formats -- Parts 1-4. Latest version (2016 as of May 2020) is available from ISO/IEC Publicly Available Standards. All editions of the OOXML standards as published by ECMA are available from ECMA-376: Office Open XML File Formats. See Notes below for a chronology. Annex L of Part 1 is a Primer (informative rather than normative) that introduces key features of the constituent markup languages, including WordprocessingML, SpreadsheetML, and PresentationML, relating elements and attributes to intended functionality through examples. |
Adoption |
Widely adopted. According to a March 2020 post from CIODive, "Microsoft owns nearly 90% of the office suite market, or email and authoring market, as Gartner calls it. Google holds onto just over 10%, but is gaining about 1% market share annually." See individual subtypes for more detail. In addition to end user applications that support OOXML, some open-source software libraries are available. libOPC provides support for reading and writing the OPC packages and also support for processing markup using the Markup Compatibility and Extensibility (MCE) mechanisms defined in the standard. In June 2014, Microsoft released its Open XML SDK (first released for use in 2007), as open source. Apache POI - the Java API for Microsoft Documents provides some open source support for OOXML documents, but admits that not all features are handled, with XSLX support being "most developed." As applications have introduced support for OOXML, some developers have run into interoperability problems. Many of these have been forwarded as defect reports to the working group maintaining ISO/IEC 29500 and resolved through clarifications or small corrections in new editions of the OOXML standard or statements by Microsoft as to variations from the standard in Word, Excel, and PowerPoint in [MS-OI29500] (Office Implementation Information for ISO/IEC 29500 Standards Support) and [MS-OE376] (Office Implementation Information for ECMA-376 Standards Support). |
Licensing and patents |
The specification originated from Microsoft Corporation. Current and future versions of ISO/IEC 29500 and ECMA-376 are covered by Microsoft's Open Specification Promise, whereby Microsoft "irrevocably promises" not to assert any claims against those making, using, and selling conforming implementations of any specification covered by the promise (so long as those accepting the promise refrain from suing Microsoft for patent infringement in relation to Microsoft's implementation of the covered specification). |
Transparency |
For transparency of the package containing the constituent parts of the DOCX file, see OPC/OOXML_2012. Inside the OPC package, the OOXML formats are XML-based and inherently more transparent than their binary predecessors. See individual subtypes for more detail. |
Self-documentation |
See individual subtypes, in particular OPC/OOXML_2012, the package format that is used by the other formats. Accessibility Features The OOXML Format Family provides moderate support for accessibility. See subtypes for specifics. |
External dependencies |
None, beyond XML-aware software. See individual subtypes, in particular OPC/OOXML_2012, the package format that is used by the other formats. |
Technical protection considerations | Encryption is not permitted within the OPC package [OPC/OOXML_2012] used to wrap all OOXML documents as of 2014. However, an OPC package may be encrypted and some applications using this container format as the basis for a more specific format, may use encryption during interchange or DRM for distribution. |
Other | |
---|---|
See individual subtypes |
See individual subtypes, particularly DOCX/OOXML_2012, XLSX/OOXML_2012, and PPTX/OOXML_2012. |
Tag | Value | Note |
---|---|---|
Filename extension | docx xlsx pptx |
And extensions used by other formats based on the OOXML specifications. See Notes below. |
Internet Media Type | application/vnd.openxmlformats-officedocument.wordprocessingml.document application/vnd.openxmlformats-officedocument.spreadsheetml.document application/vnd.openxmlformats-officedocument.presentationml.document |
From IANA assignments site. |
File signature | See related format. | See ZIP_PK. |
XML namespace declaration | http://schemas.openxmlformats.org/.../2006/main http://purl.oclc.org/ooxml/.../main |
The first pattern is for the Transitional variants of the three OOXML document types, with the ellipses being replaced by wordprocessingml, spreadsheetml, or presentationml as appropriate. The second pattern applies to the Strict variants of the three document types, with the same substitutions for the ellipses. |
Other | NF00311 |
See NARA File Format Preservation Plan ID https://www.archives.gov/files/lod/dpframework/id/NF00311.ttl for Microsoft Word for Windows 2007-onwards (OOXML). Note that NARA does not specify versions. |
Pronom PUID | fmt/189 |
See http://www.nationalarchives.gov.uk/PRONOM/fmt/189. |
Wikidata Title ID | See note. | The Wikidata:WikiProject Informatics/File formats resource provides records for a large number of OOXML subtypes. See Wikidata:WikiProject Informatics/File formats/Lists/File formats. See also descriptions of subtypes on this website, for example, DOCX/OOXML_2012, XLSX/OOXML_2012, PPTX/OOXML_2012. |
General |
File extensions used: The extensions listed below are commonly used. They are not defined in ISO/IEC 29500, but most of them are specified in the many MIME type registrations made with IANA, using the pattern application/vnd.openxmlformats-officedocument.... Macro-enabled versions of the primary document types and of templates are for files that do not technically comply with the OOXML standard, which does not permit the use of macros. But the files do use the OPC/OOXML_2012 package specification and many parts of the package follow the official standard. See note below for more on macro-enabled files, which are documented by Microsoft.
Macro-enabled variants: The macro-enabled variants of the OOXML formats are straightforward extensions of the primary formats without macros. They use OPC/OOXML_2012 as a package, simply adding a few parts that contain the macros and associated data. Macros for Office are written in Visual Basic for Applications (VBA). Note that macros do not work in Office for Mac 2008. In Office for Mac 2011 (the latest version as of 2014), macros are supported. However, not all macros originally written for a Windows version of Office will run on a Mac without modification to take account of differences between the implementations of VBA for the Windows and Mac (OS X) versions of Office, for example those that use ActiveX or other Windows-specific features. The additional parts for macros are defined in three supplementary documents:
Note that the VBA Editor used to develop macros and distributed as part of desktop Office applications for Windows (but hidden from users by default), offers the ability to export VBA macro code as a .BAS file (which is a regular text file). Microsoft applications offer a Save As option to drop the macro-related parts from a macro-enabled file and create a regular document. This is commonly done to archive a snapshot of a document or spreadsheet in which macros are used to update the file, perhaps based on external data. Apache Open Office and LibreOffice offer options as to how to handle macros on import of .xlsm files. Neither application can run all VBA macros as-is, although, according to Using Microsoft Office and LibreOffice in late 2014, "recent versions of LibreOffice can run some Visual Basic scripts" if the feature is enabled. |
---|---|
History |
The first XML-based formats for Word and Excel were included in the release of Office 2003; these were flat XML files. These were partially documented on the Microsoft Development Network (MSDN) site. These formats were precursors to OOXML, with both similarities and significant differences. The original OOXML specification was published as ECMA-376 in 2006. The primary difference between that version and the version published as ISO/IEC 29500:2008 was the split between the Strict variants of DOCX, XLSX, and PPTX (as specified in Part 1) and the Transitional variants (as defined in Part 4 in conjunction with Part 1). All versions since ISO/IEC 29500:2008 specify essentially the same format. The editions published by ISO/IEC in 2011 and 2012 consisted primarily of clarifications and corrections. In particular, modifications to Part 4 (Transitional Migration Features) have been intended to ensure that the specification corresponds to the corpus of existing documents and that interoperability between existing applications was improved rather than disrupted. See individual subtypes, particularly DOCX/OOXML_2012, XLSX/OOXML_2012, and PPTX/OOXML_2012 for more detail for the three primary OOXML document types. The chronology of editions specifying the OOXML family of formats is:
A new edition of Part 3 of the specification, for Markup Compatibility and Extensibility, was published in early 2015. The intent of the update was to clarify the text and to emphasize the applicability of MCE beyond OOXML to support interoperability. The new edition does not introduce new features but does remove some flexibility that had not been exploited in practice and is deemed unnecessary. Most importantly, it makes the process for handling MCE on file import much clearer. Another chronology of relevance to digital archivists is the support for OOXML formats in different versions of the Office software. See Office File Formats Overview from 2016 for a Microsoft summary of the chronology. Files created by the Microsoft Office applications have a /docProps/app.xml part that contains properties for the document as a whole, including <Application> and <AppVersion>. Values for AppVersion are numeric, representing internal version numbers used by Microsoft during development. The integral part of the AppVersion values in files created by versions of Microsoft Office are: 12 = Windows Office 2007 or Office for Mac 2008; 14 = Windows Office 2010 or Mac Office 2011; 15 = Windows Office 2013 and Office for Mac 2016; 16 = Windows Office 2016.
Note that although versions of Office dated 2016 were released for both Windows and the Mac OS, they do not declare the same AppVersion value. The Windows and Mac versions of Office do not have identical codebases. Tests of the most recent iPad version of Excel in December 2014 and February 2017 revealed "Microsoft Macintosh Excel" for the value for Application and "15.0300" as the value for Appversion. Thus, it appears that the iPad versions of Office apps are related to Office for Mac. The compilers of this resource have not had the opportunity to check AppVersions in files created using Office 365 or Android apps. Comments welcome. Starting with Office 2016, Microsoft has strongly encouraged subscription-based access and frequent updates. Format support may be adjusted in updates. For example, Office for Mac 2016 (first released in mid-2015), introduced support for export to ODF in the June 2016 update. A new edition of ISO/IEC 29500-2 Part 2: Open packaging conventions was published in 2021. This edition preserves all functionality of the previous edition and adds no new functionality, but has been extensively re-organized and brought into line with ISO practices and the other specifications in the OOXML family. Where appropriate, it now uses undated or more recent versions of standards as normative references. Particular areas that have been clarified relate to the use of non-ASCII characters in names of parts in a package and the application of digital signatures. |
|