Sustainability of Digital Formats: Planning for Library of Congress Collections |
|
![]() |
|
Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact |
Full name | ODF Family. OASIS name: Open Document Format for Office Applications (OpenDocument). ISO name: ISO/IEC 26300, Information technology -- Open Document Format for Office Applications (OpenDocument) |
---|---|
Description |
This description is an overview of the family of formats defined by ISO/IEC 26300: Information technology -- Document description and processing languages -- Open Document Format for Office Applications (OpenDocument) and the corresponding family of Open Document Format (ODF) specifications from OASIS. The formats in the ODF family are XML-based, application-independent, and platform-independent file formats for editable documents. The specifications have been developed and are maintained by OASIS (Organization for the Advancement of Structured Information Standards). The ODF specifications are intended to support document authoring, editing, viewing, exchange and archiving for text documents, spreadsheets, presentation graphics, drawings, charts and similar documents commonly created or used in personal productivity software applications. In addition to being XML-based and designed to support editing, requirements in the charter for the Open Document Format technical committee included:
Content categories: The ODF family of formats includes subtypes for documents in different content categories, such as word-processing, spreadsheet, and presentation file formats. Documents comprise related component files bundled in a container/wrapper format called a "package." A typical ODF document has the following components:
Chronological versions: OASIS has developed and published four versions of the ODF specification. Version 1.0 was published in May 2005, version 1.1 in February 2007, and version 1.2 in September 2011. Changes between ODF 1.0 and 1.1 were relatively minor, apart from extensions to address accessibility concerns. On this site, the plan is, over time, to develop descriptions for the package formats and categorical subtypes based on ODF 1.1 (also covering ODF 1.0), and separate descriptions for the ODF 1.2 versions. The OASIS Open Document Format for Office Applications (OpenDocument) TC announced approval for a Committee Specification for ODF 1.3 on December 4, 2020; see Open Document Format for Office Applications (OpenDocument) v1.3 from the OpenDocument TC approved as a Committee Specification. For the history of the format prior to submission for standardization through OASIS, and for more detail on ISO standardization for the later ODF versions, see History Notes below. ODF package formats: An ODF document is always in a package that aggregates constituent components of a document (or other type of content) into a single object. The main ODF package specification is based on the ZIP format as defined in APPNOTE.TXT from PKWARE (see ZIP_PK). For details of the ODF package formats, see ODF_package_1_1 for ODF 1.1 and ODF_package_1_2 for ODF 1.2. ZIP permits use of various compression and encryption algorithms. Compression in ODF files is restricted to the "deflate" algorithm. Encryption mechanisms as defined in APPNOTE.TXT are not permitted. A single encryption mechanism is specified for ODF_package_1_1 and the approach is updated in ODF 1.2 (ODF_package_1_2) specification to allow stronger encryption. Digital signatures are not supported in ODF 1.1; support for digital signatures was added in ODF 1.2. See Notes below for more detail on versions of ZIP referenced by different versions of ODF. Also, see Notes below for information about an alternative wrapper format, a "flat" Single OpenDocument XML file format. |
Production phase | An ODF document can be used in any production phase. However, textual ODF documents (.odt) and presentations (.odp) are often converted to a static format rather than an editing format for final publication or archiving. |
Relationship to other formats | |
Has subtype | ODF_text_1_2, OpenDocument Text Document Format (ODT) , Version 1.2, ISO 26300-1:2015 |
Has subtype | ODF_spreadsheet_1_2, OpenDocument Spreadsheet Document Format (ODS), Version 1.2, ISO 26300-1:2015 |
Has subtype | ODF_presentation_1_2, OpenDocument Presentation Document Format (ODP), Version 1.2, ISO 26300-1:2015 |
Has subtype | ODF_draw_1_2, OpenDocument Drawing Document Format (ODG), Version 1.2, ISO 26300-1:2015 |
Has subtype | ODF_chart_1_2, OpenDocument Chart Document Format (ODC), Version 1.2, ISO 26300-1:2015 |
Has subtype | ODF_dbfront_1_2, OpenDocument Database Front End Document Format (ODB), Version 1.2, ISO 26300-1:2015 |
Subtype of | ZIP_6_2_0, ZIP File Format, Version 6.2.0 (PKWARE). Various features of the ZIP File Format are not permitted in ODF. ODF 1.2 uses ZIP 6.2.0 as the normative basis for the package specification. See Notes below for more detail on versions referenced by earlier ODF specifications. |
Contains | META-INF/manifest.xml file. This manifest file, not described separately in this resource, is mandatory in all ODF packages. |
Has earlier version | OpenOffice.org 1.0 format, a precursor to ODF, not described on this site at this time. |
Has later version | ODF 1.2 Extended formats, variants with additional markup supported by ODF implementations, for example LibreOffice and Apache OpenOffice, since OpenOffice 3.2. |
Has subtype | ODF_package_1_1, OpenDocument Package Format, ODF 1.1. Includes ODF 1.0, not described separately at this site. |
Has subtype | ODF_package_1_2, OpenDocument Package Format, ODF 1.2 |
Defined via | XML_1_0, XML (Extensible Markup Language) 1.0. ODF specifications include normative RELAX NG schemas. |
LC experience or existing holdings | As of late 2020, the Library of Congress had around 52,000 files with the extensions in the ODF family in its digital collections, for a total size of around 2 gigabytes. These files come from many different sources, including archived websites and files acquired by the Manuscript Division in collections of "papers" from individuals or organizations. |
---|---|
LC preference | The Library of Congress Recommended Format Statement (RFS) lists ODF as an acceptable format for textual works in digital form and electronic serials. In general, the Library of Congress prefers formats intended for final publication of textual works, rather than editable formats. Editable formats will be found in collections of papers of organizations and individuals. |
Disclosure | International open standard. Maintained by OASIS Open Document Format for Office Applications (OpenDocument) TC. To summarize from the OASIS FAQ, OASIS promotes the development and adoption of open standards, using transparent governance and operating procedures, and offering a range of membership levels to support an inclusive, international, and balanced member base. After approval as an OASIS standard, ODF specifications have been submitted to ISO/IEC for approval by JTC1/SC34/WG6 as parts of ISO/IEC 26300. |
---|---|
Documentation |
Specifications from OASIS: Open Document Format for Office Applications (OpenDocument) Specification. For earlier versions, see Format specifications below or the list on the page for the OASIS OpenDocument TC. Specifications published as ISO/IEC 26300, with the latest versions available on https://standards.iso.org/ittf/PubliclyAvailableStandards/:
|
Adoption |
As of 2020, office software suites using ODF as native file format include: LibreOffice, Collabora, Apache OpenOffice, and Calligra. LibreOffice and Apache OpenOffice (AOO) derive from a split in 2010 when Sun Microsystems, sponsor of the open source OpenOffice.org was acquired by Oracle and some developers left to form LibreOffice. Oracle later licensed the OpenOffice codebase to the open source Apache organization. See Notes below for more detail on relationship between the AOO and LibreOffice projects in 2015. As of 2015, both AOO and LibreOffice installed with a default format variant described as ODF 1.2 Extended. See Notes/History below for more on ODF 1.2 Extended. KOffice, an important product during the development of ODF, is no longer supported; its website was taken down in 2012. The Calligra suite has developed from a fork with KOffice in 2010. The KDE project and organization, formerly associated with the KOffice software application continues and the Calligra Development Wiki is part of the KDE Community Wiki. Microsoft Office 2013, Office 2016, and Office 365 permit the choice of ODF 1.2 or 1.1 as the default format for new files; when editing such files, features supported in OOXML but not in ODF are hidden from users and no translation between ODF and OOXML occurs. The compilers of this resource believe that these suites offer three independent codebases that use the ODF format as the underlying native format: from OpenOffice/LibreOffice/Collabora; Calligra/KDE; and Microsoft. Additionally, since the fork between OpenOffice and LibreOffice, significant differences between their codebases have been introduced. Comments welcome. Other office applications that support ODF include: AbiWord (open source), which has a plug-in; and Corel WordPerfect Office Suite (since version X4). Products descended from early adopters of ODF include: IBM Connections Docs, NeoOffice (which started as a Mac-oriented fork from OpenOffice.org), and SoftMaker Office (which runs on Windows, MacOS, and Linux). Two suites formerly available as independent products are no longer maintained: StarOffice and IBM Lotus Symphony. Code from these products was made available to Apache OpenOffice. A number of government bodies have adopted the ODF family of formats as mandatory or recommended for documents which must be editable to support collaboration within the government or between the government and the public or other entities. The Wikipedia page on OpenDocument Adoption is intended to document evaluation or adoption of ODF by governments. However, the page is clearly not actively maintained. Examples of governmental policy documents that mandate ODF among editable documents include:
Policy documents that include ODF among acceptable formats for exchange of editable documents with governments include:.
A sampling of support and categorization by archival institutions follows:
Despite official mandates and recommendations, adoption of ODF formats has been slow, particularly in the U.S. See, for example, The Long Slog to Level the Document Playing Field from January 2015. |
Licensing and patents | No concerns. The specification is provided by the OASIS Open Document Format for Office Applications (OpenDocument) using the OASIS IPR royalty-free model entitled "RF on Limited Terms." in See OASIS Intellectual Property Rights (IPR) Policy and IPR statements from Sun Microsystems, Inc. specific to technology contributed to the ODF project. |
Transparency |
The XML-based files defined by the ODF schema are both human-readable and machine-processable. The mixed-content markup model, the prefixes, and the names for elements and attributes in the specification contribute to human readability. See Notes below on mixed-content markup. Transparency of the ZIP-based container corresponds to that of ZIP_PK. It depends upon algorithms and tools to interpret and extract contents. It would require sophistication to build tools from scratch, but many tools exist. Transparency ultimately depends on the files contained in the package. Files may be encrypted. Binary files, such as image files, may be included in the document package. |
Self-documentation |
Pre-defined metadata elements for the document as a whole include:
The pre-defined elements are all optional and repeatable. However, applications are not required to update multiple occurrences in a specific way to reflect modifications to a document. In addition to pre-defined elements, ODF 1.0/1.1 specified markup to allow addition user-defined elements to be expressed by name and value. ODF 1.0/1.1 also provided a mechanism for custom metadata, by indicating that all content in an office:meta element should be preserved. In ODF 1.2, the earlier mechanism for custom metadata was deprecated in favor of using new markup specified for incorporating RDF-based metadata. |
External dependencies | Depends on files contained in the package. |
Technical protection considerations | Encryption is supported for files within an ODF package. In addition, an ODF package file may be encrypted during interchange or as part of DRM controlling distribution. |
Other | |
---|---|
See note | Quality and functionality factors will be considered for subtypes for specific content categories. See for example, ODF_text_1_2. |
Tag | Value | Note |
---|---|---|
Filename extension | See note. | ODF package files use extensions appropriate to the type of document packaged. Hence, .odt, .odp, .ods, are all extensions used for ODF packages. |
Internet Media Type | See note. | ODF package files use MIME types appropriate to the type of document packaged, using the pattern application/vnd.oasis.opendocument.xxxxx. The registered MIME types are listed in Annex C of the ODF 1.3 specification. For example, the MIME type for an ODF text document is application/vnd.oasis.opendocument.text. See ODF_text_1_2. The MIME type for a Flat ODF file is found in an attribute for the top-level <office:document> element. |
Magic numbers | See related format. | See ZIP_PK. |
Indicator for profile, level, version, etc. | See note. | The four root elements used in the primary files in an ODF package all permit an attribute that records the ODF version, e.g, "1.0" or "1.2". In a Flat ODF file the version is in an attribute for the top-level <office:document> element. |
General |
Namespaces defined in ODF specifications and schemas: Namespaces defined for versions 1.0 and 1.1 of ODF are shown with their default prefixes in subclause 1.3 of the ODF 1.1 specification. The namespaces use the pattern urn:oasis:names:tc:opendocument:xmlns:XXXXX:1.0, where XXXXX is the default prefix. For example, the namespace urn:oasis:names:tc:opendocument:xmlns:office:1.0, with the associated default prefix office:, is used for common pieces of information not contained in a more specific namespace. Namespaces for ODF 1.2 are listed in subclause 1.5 Namespaces of Part 1 of the ODF 1.2 specification. They are the same as for ODF 1.1 apart from the introduction of a new namespace with default prefix db: for the new ODF subtype for a database front-end. Subclause 1.5 Namespaces of Part 3 of the ODF 1.3 Committee Specification 02 does not introduce any new namespaces. The XML Namespace Document for OpenDocument Version 1.3 states that "It is the intent of the OASIS Open Document Format for Office Applications (OpenDocument) TC that XML namespace names identified by URI references will not change arbitrarily with each subsequent revision of the corresponding WSDL or XML Schema documents but rather change only when a subsequent revision, published in conjunction with a Committee Specification Draft, results in non-backwardly compatible changes from a previously published Committee Specification Draft." Namespaces and default prefixes are defined for different types of content, e.g., text:, table:, chart:, etc. For convenience, the default prefixes are used in this discussion rather than the full namespace URIs. The same convention is used in the ODF specifications. A namespace is not constrained to a particular document category, but usable in any appropriate context. For example, the elements and attributes with the table: prefix are used for tables in text documents and for tables in spreadsheets. Namespaces defined in ODF 1.1 for different types of content included:
Additional namespaces and default prefixes are used for elements and attributes that relate to ODF packages and generic ODF features, including meta:, odf:, office:, style:, manifest: and number: (for display styling for numbers). Two namespaces are used primarily to hold application-specific content: config:, for application settings and script:, for macros. The ODF specification does not prescribe any particular macro language. It permits the inclusion of scripts in any language and allows the language for any embedded script to be declared. ODF 1.2 introduced a new namespace, with prefix db:, designed to support use as a stand-alone database front-end document and to allow documents of other categories to import data from databases, for example to support mail-merge or dynamic update of charts from external data. Standards from which ODF borrows: XML-based standards and the prefixes used in ODF are:
Mixed-content markup: A simple example of an ODF paragraph demonstrates the readability of mixed-content markup:
The definitions for the named styles (Standard, T1 and T2) will be stored in the separate styles.xml file. For a comparison with the markup for the same paragraph in OOXML, see OpenDocument vs Microsoft OpenXML - Part II, a page from the website of the OpenDocument Fellowship, an organization that used to promote ODF, but was inactive by 2013. Although human readability is an advantage and can make it easier for programmers to understand the structure when developing code to parse or display the content, mixed-content has disadvantages too. See Mixed content myopia for the opinion of a programmer who points out some challenges. More directly related to ODF, if a paragraph element allows mixed content, but it is necessary to apply attributes to a piece of text within the paragraph, for example, when tracking changes, then it is necessary to put that text into its own element, which detracts from the readability. ZIP versions used for ODF: Version 1.0 as published by OASIS used a reference to the latest version of appnote.txt from PKWARE, which was subject to unannounced updates by the owner, PKWARE. Versions 1.0 from ISO/IEC and 1.1 from OASIS referred to an unofficial variant of PKWARE's appnote.txt from Info-ZIP. This adaptation of PKWARE's specification explicitly removed some capabilities for encryption and other services patented by PKWARE. The Info-ZIP software was used widely as a library for working with an interoperable open subset of the ZIP format. The open source Info-ZIP project has not seen much action since early 2009, although betas have been released since then. ODF 1.2 uses PKWARE's version 6.2.0 of appnote.txt [see ZIP_6_2_0]. The compilers of this resource are not aware of substantive differences in the intent of the ZIP specifications in ODF 1.0-1.2 or among software implementations creating ODF files. Comments welcome. Flat ODF format: Although ODF documents are almost always packaged in ZIP files, a format that is a Single OpenDocument XML file is specified. The specification is in clause 2.1 (Document Roots) in the specification for ODF 1.1 and in clause 3.1.2 (Single OpenDocument XML file) in ODF 1.2 Part 1 and ODF 1.3 Part 3. This format is often termed Flat ODF. The root of a Flat ODF file is the <office-document> element. In contrast, a packaged ODF document has several separate XML files in its ZIP-based structure with root elements: <office-document-content>; <office-document-styles>; <office-document-meta>; and <office-document-settings>. There are advantages to the flat representation for parsing, validation, document comparison, and programmatic generation with basic XML tools, as argued by Fridrich Štrba in Flat ODF as the Swiss Army Knife. Relationship between Apache OpenOffice and LibreOffice: Although the codebase for LibreOffice and Apache OpenOffice (AOO) has common heritage, the codebases have grown apart since the 2010 split. The fact that the organizations opted for different open source licenses means that although LibreOffice can absorb code from the AOO project, the reverse is not true. According to a blog post by Chris Hoffman at How-To Geek, "The Apache OpenOffice project uses the Apache License, while the LibreOffice uses a dual LGPLv3 / MPL license. The practical result is LibreOffice can take OpenOffice’s code and incorporate it into LibreOffice — the licenses are compatible. ... the two different licenses only allow a one-way transfer of code. LibreOffice can incorporate OpenOffice’s code, but OpenOffice can’t incorporate LibreOffice’s code." Daniel Brunner, head of the IT department of Switzerland's Federal Supreme Court has argued that the two projects should merge back together, as reported in Open and Libre Office projects should reunite in September 2014. In response, Reuniting LibreOffice and AOO – a personal take is a blog post by Charles-Henri Schulz, who was heavily involved in OpenOffice and ODF standardization before the split and later in building support for LibreOffice, concludes that the projects themselves have little interest in merging and without such interest there would be no benefit. His personal view is clearly that arguments made by outsiders for a reconciliation are flawed. LibreOffice, OpenOffice, and rumors of unification, another blog post response, this one by Bruce Byfield, concludes that, "the idea of unification should be shelved as unworkable." He includes the sentence, "In other words, for some reason, development of OpenOffice has all but stalled, while LibreOffice remains an active project." In 2020, LibreOffice released a major upgrade (7.0, with support for ODF 1.3); Apache OpenOffice released two maintenance releases, 4.1.7 and 4.1.8. |
---|---|
History |
History prior to ODF standardization: The charter stated, "Since the OpenOffice.org XML format specification was developed to meet these criteria and has proven its value in real life, this TC will use it as the basis for its work. ... In the first phase, this TC will use proven and established constructs so that the resulting standard can satisfy the immediate needs of many users, as well as serve as a base for future, less restricted development. ... In the second phase, this TC will maintain the specification delivered in phase 1 and will extend it to encompass additional areas of applications or users, which may also include adapting the specification to recent developments in office applications." The first phase resulted in a Committee Draft in March 2004. The second phase, of maintenance and extension, is ongoing. After publication of ODF 1.0 by OASIS in May 2005, ODF 1.0 was submitted to ISO/IEC JTC1/SC34 for ISO standardization in late 2005 and published as ISO/IEC 26300:2006 in late 2006. A detailed chronology for the development of ODF prior to publication of version 1.0 by OASIS follows:
The chronology above was pieced together from a variety of sources, including Rob Weir's 2007 blog post with a timeline, Introduction to OpenDocument Format (from IBM, apparently from 2006), Office opens its doors (July 2006 article in the Guardian newspaper), Office Productivity Suite Competitive Analysis (by Martijn W.H. Dekkers. 2002), From Open Source to Open Standard: The OASIS OpenDocument Format (presented by Michael Brauer at XTech 2005: XML, the Web and beyond. April 2005), and Open by Design: The Advantages of the OpenDocument Format (2006 OASIS white paper). After the OASIS approval of ODF 1.0, the specification was submitted to ISO/IEC JTC1/ SC34 in late 2005, with approval as ISO/IEC 23600 in May 2006 and publication in late 2006. Maintenance of ODF continues to be through the OASIS Open Document Format for Office Applications (OpenDocument) TC. History since ODF standardization through ISO: OASIS has developed and published two additional versions of the ODF specification. Version 1.1 was published in February 2007, and version 1.2 in September 2011. Changes between ODF 1.0 and 1.1 were relatively minor, apart from extensions to address accessibility concerns. A 2012 amendment to ISO/IEC 26300:2006 brought the ISO standard into alignment with ODF 1.1. ODF 1.2 was approved by ISO/IEC JTC1/SC34/WG6 in September 2014 and published as an ISO/IEC standard in June 2015. ODF 1.2 introduced several substantive extensions, for digital signatures, for RDF-based metadata, and OpenFormula for spreadsheet formulas. History of OpenOffice implementations after standardization through ISO: According to Wikipedia entry for OpenOffice.org, "Versions 2.0–2.3.0 default to the ODF 1.0 file format; versions 2.3.1–2.4.3 default to ODF 1.1; versions 3.0 onward default to ODF 1.2.". No later than OpenOffice 3.2 (released in February 2010), new versions of OpenOffice (and successors AOO and LibreOffice) have installed with a default format described as ODF 1.2 Extended. From OpenOffice.org 3.2 New Features, "As OpenOffice.org 3.2 currently requires a superset of the ODF 1.2 specification, the software now warns users when ODF 1.2 Extended features have been used." LibreOffice provides both an explanation for this practice and detailed documentation on its extensions. ODF 1.3 was approved as an OASIS Committee Specification in December 2020, according to a December 4, 2020 announcement. This followed several periods of public review in 2019 and 2020. The next stage in the multi-step OASIS process is to gather three "statements of use", written statements that a party has successfully used or implemented the specification. See Approval of an OASIS Standard. The specification for ODF 1.3 has been re-organized into four Parts. Part 1 is a brief introduction; part 2 is the Packages specification; Part 3 defines the OpenDocument Schema, which includes specifications for the ODF content subtypes; and Part 4 defines the Recalculated Formula (OpenFormula) Format. The Packages specification has been updated to support more secure digital signatures and encryption. Most of the other changes are for corrections and clarifications, particularly aimed at improved interoperability between implementations. |
|