Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

Open Packaging Conventions (Office Open XML), ISO 29500-2:2008-2012

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Open Packaging Conventions (Office Open XML), ISO 29500-2:2008-2012
Description

OPC, Open Packaging Conventions, defines a generic "container" format designed to contain a collection of files (termed "parts" in the OPC specification) that represent a single logical whole. Rather than being a format intended for use as is, it is more a format with some generic structures intended to be used as the basis for a more refined specification for a particular package type. Generic features include a structure for addressing parts (component files) and expressing relationships among parts in a way that allows applications to understand the technical nature of the content (media types included) and navigate the relationships without opening component files. A specification based on OPC would establish particular naming conventions and semantics for parts, and also relationship types together with their semantic definitions. For example, the initial use of the OPC package has been as a container for the word-processing, spreadsheet, and presentation formats defined in other parts of ISO/IEC 29500, and produced as .docx, .xslx, and .pptx documents by Microsoft Office products since 2007. The specification in ISO/IEC 29500-1 defines part names and relationships used by these three formats.

This description focuses on the OPC format as specified in part 2 of ISO/IEC 29500:2012, Information technology -- Document description and processing languages -- Office Open XML File Formats (OOXML). However, since this specification has very few changes since the format was first standardized as ECMA-376, Part 2 in 2006, the description can be read as applying to all versions published by ECMA International and by ISO/IEC through 2012. See Notes below for more detail on the chronological versions and minor differences. In this format description, the names OPC and OPC/OOXML_2012 should be considered equivalent.

An OPC package is a container that holds a collection of parts. The purpose of the package is to aggregate constituent components of a document (or other type of content) into a single object. The OPC specification describes an abstract model and a single physical format based on the ZIP File Format [ZIP_6_2_0]. As shown in a diagram on page 1 of Open XML: The Markup Explained by Wouter van Vugt, the Open Packaging Conventions builds on the core technologies of XML, Unicode, and ZIP. The OPC specification incorporates schemas for expressing relationships among parts, for identifying the content types for parts, and for storing digital signatures for parts. There is also a schema for describing the package as a whole through "core properties," which uses selected Dublin Core metadata elements in addition to some OPC-specific elements. In order to support efficient characterization of an OPC container without extracting all its parts, the content type of each part must be expressed in the form of a MIME type (Internet Media type) either individually or by use of a default type based on the file extension, in a special stream called [Content_Types.xml]. Another requirement is that all parts of the package must be discoverable by following relationships. A file called _rels/.rels will define a relationship to the main part (perhaps a document) and each part can have an associated .rels file with relationships to any embedded or associated files.

OPC is based on version 6.2.0 of the ZIP File Format as defined in APPNOTE.TXT, Version 6.2.0 (ZIP_6_2_0). Compression in OPC is restricted to the "deflate" algorithm; encryption mechanisms as defined in APPNOTE.TXT are not permitted. The digital signature mechanism defined in APPNOTE.TXT is not permitted, but OPC/OOXML_2012 provides an alternate mechanism for optional digital signatures for parts in an OPC package.

Production phase An OPC package container can be used in any production phase. Original use was for Office Open XML office documents, which are certainly created (initial state), exchanged for editing and review (middle-state) and may be published (final-state) in an OPC container.
Relationship to other formats
    Subtype of OOXML_Family, OOXML (ISO/IEC 29500) Format Family
    Subtype of ZIP_6_2_0, ZIP File Format, Version 6.2.0 (PKWARE). Various features of the ZIP File Format are not permitted in OPC. Details on the use of ZIP in OPC are in section 10 and Annex C of ISO/IEC 29500-2:2012.
    Contains [Content_Types].xml part. This part is mandatory in all OPC packages. It contains a list of the MIME types and extensions for all of the other parts in the package.
    May contain Various specified parts conforming to XML Schema specifications included in ISO/IEC 29500-2:2012, including Relationships Parts or a single Core Properties Part. Various XML-based parts associated with digital signatures may also be included.
    Has subtype DOCX/OOXML_2012, DOCX Transitional (Office Open XML), ISO 29500:2008-2016; ECMA-376, Editions 1-5
    Has subtype XLSX/OOXML_2012, XLSX Transitional (Office Open XML), ISO 29500:2008-2016; ECMA-376, Editions 1-5
    Has subtype PPTX/OOXML_2012, PPTX Transitional (Office Open XML), ISO 29500:2008-2016; ECMA-376, Editions 1-5
    Has subtype DOCX/OOXML_Strict_2012, DOCX Strict (Office Open XML), ISO 29500:2008-2016; ECMA-376, Editions 2-5
    Has subtype XLSX/OOXML_Strict_2012, XLSX Strict(Office Open XML), ISO 29500:2008-2016; ECMA-376, Editions 2-5
    Has subtype PPTX/OOXML_Strict_2012, PPTX Strict (Office Open XML), ISO 29500:2008-2016; ECMA-376, Editions 2-5
    Has subtype VSDX, Visio VSDX Drawing File Format. The default Visio drawing format since Visio 2013.
    Has subtype Other application-specific formats based on the OPC package. Microsoft has made more use of OPC: NuGet, an extension for Microsoft Visual Studio that provides an interface for managing third-party libraries for NET projects uses OPC to package source code modules. The SMPTE Media Package (ST 2053:2011) is based on OPC; see Notes below for a brief description of ST 2053 and how it refines OPC. AutoCAD's Design Web Format .dwfx is also based on OPC.
    Defined via XML_Schema_1_0, W3C XML Schema 1.0. Structural elements of the OPC container that refine the ZIP format are defined using XML Schema (.xsd) specifications. Equivalent RELAX NG schemas are also provided.

Local use Explanation of format description terms

LC experience or existing holdings See individual subtypes.
LC preference See individual subtypes.

Sustainability factors Explanation of format description terms

Disclosure International open standard. Maintained by ISO/IEC JTC1 SC34/WG4. Originated by Microsoft Corporation and first standardized through ECMA International in 2006. Approval as part 2 of ISO/IEC 29500 was in 2008.
    Documentation

ISO/IEC 29500-2, Information technology -- Document description and processing languages -- Office Open XML File Formats -- Part 2: Open Packaging Conventions. Latest version (2012 as of February 2017) is available from ISO/IEC Publicly Available Standards.

All editions of the OOXML standards as published by ECMA are available from ECMA-376: Office Open XML File Formats. See Notes below for a chronology.

Adoption

OPC/OOXML_2012 was originally developed by Microsoft as a container format for documents produced by its Office products. Since Office 2007, the default formats for word-processing, presentations, and spreadsheets (.docx, .pptx, .xslx, respectively) have been OPC packages. In addition OPC is the basis from some other Microsoft package formats, including Microsoft's XML Paper Specification (.xps), later modified and standardized as Open XML Paper Specification ECMA-388 (using .oxps as extension) and the Visio VSDX format (using .vsdx as an extension). The current .dwfx format for AutoCAD Design Web Format files designed for distributing fixed versions of design drawings and supplemental materials for review is based on OPC and compatible with XPS. Another use of OPC is as the package format for SMPTE's Media Package for Storage Distribution and Playback of Multimedia File Sets and Internet Resources (ST 2053:2011). The compilers of this resource have been unable to determine how the SMPTE package is being used. Comments welcome. See Open Packaging Conventions page on Wikipedia for a fuller list of formats based on OPC.

Windows includes two different libraries for handling OPC containers: a COM-based API and a managed .NET-based API. In addition, an open source platform-independent library, libopc, using the C programming language, has been released. Libopc comes with command line tools opc_dump and opc_extract which will, respectively, dump the structure of an OPC container and extract parts from an OPC container. Libopc also handles MCE (Markup Compatibility and Extensibility), another part of the OOXML standard that may be employed to support extensions to XML-based formats using the OPC container as a basis. In addition, most ZIP tools will unpack an OPC container if the file extension is changed to .zip.

In June 2014, Microsoft released the Open XML SDK as open source. Among other capabilities, this SDK provides the ability to open OPC packages, extract parts, and construct new OPC packages.

    Licensing and patents The specification originated from Microsoft Corporation. OPC/OOXML_2012 and future versions of ISO/IEC 29500-2 and ECMA-376 are covered by Microsoft's Open Specification Promise, whereby Microsoft "irrevocably promises" not to assert any claims against those making, using, and selling conforming implementations of any specification covered by the promise (so long as those accepting the promise refrain from suing Microsoft for patent infringement in relation to Microsoft's implementation of the covered specification).
Transparency

As a container, transparency corresponds to that of ZIP_PK. It depends upon algorithms and tools to interpret and extract contents. It would require sophistication to build tools from scratch, but many tools exist. The parts/files that represent the structure of the OPC package are all in XML and thus both human readable and easily machine processable. Transparency ultimately depends on the files contained in the package.

Self-documentation

The built-in support for metadata is limited. OPC has an optional part with the name Core Properties. This part has fifteen elements, all optional and all non-repeatable. Six are selected from the main Dublin Core Metadata Initiative (DCMI) set (title; creator; description; subject; identifier; language) and two from the supplementary dcterms vocabulary (created; modified).

Specifications that build on OPC to define more specific package formats can permit or require the inclusion of a richer metadata appropriate for a particular context. For example, any XML-based metadata representation can easily be included as an OPC part to describe the package as a whole and given a relationship from the Core Properties part (as indicated in ISO/IEC 29500-2:2012, ยง11.3). Some package specifications might define mechanisms for attaching descriptive or structural metadata to subsidiary content parts.

External dependencies Depends on files/parts contained in the OPC package.
Technical protection considerations Encryption is not permitted within the OPC package. However, an OPC container may be encrypted and some applications using this container format as the basis for a more specific format, may use encryption during interchange or DRM for distribution.

Quality and functionality factors Explanation of format description terms

Other
Bundling/compression Separate functionality factors for comparing formats that are used to bundle and or compress files have not been developed. From the perspective of digital preservation, consideration of the sustainability factors above is more important than the degree of compression.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension See note.  OPC does not specify an extension. Formats that use OPC as a basis are expected to adopt their own file extensions. Hence, .docx, .pptx, .xslx, .dwfx, .xps, and .oxps are all extensions for OPC packages.
Magic numbers See related format.  See ZIP_PK
Other See note.  A file called [Content_Types.xml] and a folder called _rels are mandatory in the ZIP-based OPC container.
Wikidata Title ID Q3353182
Open Packaging Conventions, ISO/IEC 29500-2. See https://www.wikidata.org/wiki/Q3353182

Notes Explanation of format description terms

General

The most significant difference between ECMA-376, Part 2, 1st edition and all later versions of this specification, including ISO/IEC 29500-2 versions dated 2008 through 2012. relates to permitting part names in the package to be IRIs (Internationalized Resource Identifiers) as defined in RFC 3987, not simply URIs as defined in RFC 3986. The compilers of this resource are unable to determine how much this feature has been supported by producing implementations.

Other differences among the specification versions to which this description applies are mainly small corrections and clarifications, with the underlying format being unaffected.

The XML 1.0 specification allows use of Document Type Definitions (DTDs), which can enable internal entity expansion, a process which has been exploited to generate Denial of Service attacks. As mitigation for this potential threat, DTD declarations are not permitted in the XML markup in an OPC package.

A SMPTE Media Package, as specified in SMPTE ST 2053:2011, is a container based on OPC. Designed to take advantage of dynamic content delivery in a multiplatform, online environment, the Media Package specification defines XML-based files for the management and playback of media essence files, and other types of files that may be useful for the description or presentation of the essence files. To quote from the specification, "Media Packages are useful for storage and electronic distribution of multiple files in a single container where multiple files are required to provide, for example, multiple resolutions, bitrates, codecs, content protection systems, languages, versions, episodes, collections, albums, metadata, and interactive presentation applications." Essences may be stored in the container or referred to by URIs for download when needed. The Media Package specification supplements the OPC specification by defining: a mandatory Table of Contents file/part, and optional parts for Presentations, TrackContainers, MediaPackage Metadata, and DRMLicense data. XML schemas to support the specification are incorporated in the standard. A SMPTE Media Package may also hold Media Applications, "a broad term intended to include a variety of presentation control programs ranging from simple play lists, declarative data, and markup languages; to procedural language programs that are interpreted by players or virtual machines, or compiled to binary to run on specific processors." Examples of Media Applications cited are Flash, Silverlight, Java, and HTML.

History

Since the original OPC specification was published as ECMA-376, Part 2 in 2006, there has been no change in the OPC format that interferes with backward compatibility. Editions of ISO/IEC 29500-2 and ECMA 376 through 2012 define the same package format; changes to OPC have been limited to clarifications and corrections. The chronology of editions specifying OPC/OOXML_2012 is:

  • ECMA-376, Part 2, 1st edition (December 2006)
  • ISO/IEC 29500-2:2008
  • ECMA-376, Part 2, 2nd edition (December 2008) [specification identical to ISO/IEC 29500-2:2008]
  • ISO/IEC 29500-2:2011
  • ECMA-376, Part 2, 3rd edition (June 2011) [specification identical to ISO/IEC 29500-2:2011]
  • ISO/IEC 29500-2:2012
  • ECMA-376, Part 2, 4th edition (December 2012) [specification identical to ISO/IEC 29500-2:2012]

Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 06/04/2017