Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

Microsoft Office PowerPoint 97-2003 Binary File Format (.ppt)

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Microsoft Office PowerPoint 97-2003 Binary File Format (.ppt).
Description

The Microsoft PowerPoint Binary File format, with the .ppt extension and referred to here as PPT, was the default format used for documents in Microsoft PowerPoint from PowerPoint 97 (released in 1997) through Microsoft Office 2003. Although it cannot support all functionality of the PowerPoint application introduced since PowerPoint 2007, the PPT format has continued to be available as an alternative to the PPTX/OOXML format, standardized in ISO/IEC 29500, for saving document files in PowerPoint. As of late 2019, the documentation from Microsoft for File formats that are supported in PowerPoint lists "PowerPoint 97-2003 Presentation." [Note: In other contexts, the same format has been called "PowerPoint 97-2004 Presentation" and "PowerPoint 97-2007 Presentation".]

The conceptual structure of a slide-based presentation includes:

  • presentation - sequence of slides intended to be viewed by an audience.
  • slide - frame containing one or more pieces of text and/or graphics, an optional transition that precedes display of the slide, and optional animation of slide content
  • slide layout - organization of elements on a slide.
  • note - slide annotation, reminder, or piece of text intended for the presenter or the audience.
  • handout - printed set of slides that can be handed out to an audience for future reference.
  • comment - text attached to a location on a slide, visible while editing or reviewing a presentation, but not during a slideshow.

According to the Wikipedia entry for Microsoft PowerPoint, "Early versions of PowerPoint, from 1987 through 1995 (versions 1.0 through 7.0), evolved through a sequence of binary file formats, different in each version, as functionality was added." Starting with PowerPoint 4.0 in 1994, the format has been based on Microsoft's Compound File Binary file format as specified in [MS-CFB], also referred to as an OLECF (object linking and embedding compound file). By 1997, a stable binary format with the .ppt extension had emerged and was the default file format used by PowerPoint 97 through PowerPoint 2003 for Windows, and in PowerPoint 98 through PowerPoint 2004 for Mac (i.e., in PowerPoint versions 8.0 through 11.0). This format description is for this last format. For convenience, the term "PPT" will be used here to refer specifically to this variant of the Microsoft PowerPoint files with .ppt as extension.

Although the PPT format is proprietary, it has been covered by Microsoft's Open Specification Promise since 2007. The specification released in 2007 is available as Microsoft Office PowerPoint 97-2007 Binary File Format Specification [*.ppt]. The structure for the PPT format has been documented and kept up-to-date in [MS-PPT].

The CFB format, on which the PPT format is based provides a file-system-like structure within a file for the storage of arbitrary, application-specific streams of data. It consists of storages, streams, and substreams. See the CFB specification in [MS-CFB]. A PPT file begins with a CFB header and should include a CFB root directory (identified by the name "Root Entry" in UTF-16). [Note: When the compilers of this resource created a PPT file using Powerpoint 11 for Mac, the name of the root directory was simply "R". This abbreviated name is mentioned in the Reference for the open source OleFile parser (under get_rootentry_name). Comments welcome.] The root directory has entries for each stream or storage object at the top level of the compound file hierarchy. Each object entry has a name (also encoded in UTF-16, although most of the document content may be stored in 1-byte characters) and points to the location in the file for the named object. Mandatory streams in a PPT file include a stream with the name "PowerPoint Document" and a "Current User" stream. The PowerPoint Document stream holds the structure of the presentation and most of its content in a hierarchy of records. Record types include DocumentContainer (which includes a list of slides) while SlideContainer and NotesContainer records hold the presentation content associated with individual slides. The SlideContainer Record has pointers to other records within the PowerPoint Document stream that specify text, vector graphics, and transitions associated with a slide and possibly to content objects stored elsewhere in the file or externally (using the linking capabilities of the CFB structure). Embedded images are stored in an optional Pictures stream. The [MS-PPT] specification suggests in 1.3.5 External Objects that audio may be embedded in the file or invoked from an external file via a link but that video is always linked rather than embedded. The MainMasterContainer record specifies the overall color scheme and layout and a HandoutContainer record specifies the formatting and options for a printed handout. The Current User stream contains the name of the current user and some other technical details used by the Microsoft PowerPoint application.

Streams that are not required by the specification, but are typically present in files written by Microsoft PowerPoint, include a SummaryInformation stream (with basic file-level metadata) and a DocumentSummaryInformation stream with additional file-level properties. A PowerPoint file in the PPT format begins as follows, with all values given as they occur in the physical file, for example when viewed using a Hex dump utility:

  • CFB header (usually 512 bytes):
    • Header Signature for the CFB format with 8-byte Hex value D0CF11E0A1B11AE1. Gary Kessler notes that the beginning of this string looks like "DOCFILE"
    • 16 bytes of zeroes
    • 2-byte Hex value 3E00 indicating CFB minor version 3E
    • 2-byte Hex value 0300 indicating CFB major version 3 or value 0400 indicating CFB major version 4. [Note: All PPT files created by compilers of this resource (in various versions of PowerPoint since 2003) and examined with a Hex dump utility have been based on CFB major version 3. Comments welcome.]
    • 2-byte Hex value FEFF indicating little-endian byte order for all integer values. This byte order applies to all CFB files.
    • 2-byte Hex value 0900 (indicating the sector size of 512 bytes used for major version 3) or 0C00 (indicating the sector size of 4096 bytes used for major version 4)
    • 480 bytes for remainder of the 512-byte header, which fills the first sector for a CFB of major version 3
    • Note: For a CFB of major version 4, the rest of the first sector would be 3,584 bytes of zeroes.
  • Internal identifiers for PowerPoint binary files. These identifier sequences may also be matched by earlier versions of CFB-based PowerPoint files. For more detailed notes on version identification, see Notes below. The compilers of this resource have observed that the streams or objects that include these sequences may occur at variable locations and not necessarily in this order:
    • 4-byte headerToken, unsigned integer, occurring in the mandatory Current User stream: Hex value 5FC091E3. This sequence is used for unencrypted PPT files. Encrypted PPT files use Hex value DFC4D1F3. See 2.3.2 CurrentUserAtom in [MS-PPT]. See also Note below on Version Identification.
    • Representation of the mandatory "PowerPoint Document" in UTF-16 in root directory. Used as signature in PRONOM entry for PUID fmt/126. Hex value 50006F0077006500720050006F0069006E007400200044006F00630075006D0065006E007400 See 2.1.2 PowerPoint Document Stream in [MS-PPT].

Embedded images are stored in an optional Pictures stream. Other optional streams are used for embedded audio, encrypted content, macros, digital signatures, etc.

The PPT format was superseded as the default format for Microsoft PowerPoint starting with PowerPoint 2007 by PPTX/OOXML, the primary XML-based presentation format of the Office Open XML (OOXML) family.

Production phase Can be used in any production phase: for creating documents (initial state): for editing and review (middle-state); and for final distribution.
Relationship to other formats
    Subtype of CFB_3, Microsoft Compound File Binary File Format, Version 3. The compilers of this resource have experimented with saving PowerPoint documents as PPT files in several recent versions of PowerPoint. In all cases, the resulting file was in version 3 of CFB. Comments welcome.
    Has later version PPTX/OOXML_2012, PPTX Transitional (Office Open XML), ISO 29500:2008-2016, ECMA-376, Editions 1-5

Local use Explanation of format description terms

LC experience or existing holdings As of January 2020, the Library of Congress has over 19,000 files with the .ppt extension and 646 files with the .pps extension in its digital collections, for a total size of over 62 gigabytes. These have come from various sources, including archived websites. As of 2020, Library of Congress staff creating presentations as part of their duties typically use the PPTX format rather than the earlier binary PPT format. Before publication on the Library of Congress website, many presentations are converted to the PDF format.
LC preference No format preference has been explicitly expressed by the Library of Congress in relation to acquisition of digital presentations for its collections. See Recommended Formats Statement.

Sustainability factors Explanation of format description terms

Disclosure The Microsoft Office PowerPoint 97-2003 Binary File (PPT) format is proprietary but openly documented and covered by Microsoft's Open Specification Promise.
    Documentation The specification is available at [MS-PPT]: PowerPoint (.ppt) Binary File Format. This document is updated quite frequently; changes are documented. Although the format is supported by chronological versions of PowerPoint from PowerPoint 97 through PowerPoint 2019, Appendix A: Product Behavior lists over 100 differences in behavior by different product versions in relation to sub-structures, primarily as to whether a version does or does not ignore or preserve a field or record. The compilers of this resource are unable to determine what effect these differences have on integrity of presentation content or interoperability with other software. Comments welcome.
Adoption

Very widely used. PowerPoint has been the market leader for preparing presentations for many years, particularly in corporate settings. See, for example, Still Going Strong: A Short History of PowerPoint, an April 2017 post on Eventbrite blog. See also an analysis from Datanyze of software use on top websites.

As of late 2019, most new documents created using PowerPoint will be in the default PPTX/OOXML format. However, the corpus of existing presentation files on the open web appears to have considerably more files in the binary PPT format than in the XML-based PPTX format. For example, a Google search in January 2020 of the U.S. web by filetype yielded: .ppt, 5,810,000; and .pptx, 3,270,000. The corresponding count of files in the OpenDocument Presentation Document Format (ODP) was 20,600, emphasizing the dominance of the Microsoft presentation formats on the open web. The compilers of this resource acknowledge that searches of the web are not a reliable measure of adoption for file formats at the initial (creation) phase of a content lifecycle.

All mainstream applications for creating presentations can import files in the PowerPoint 97-2003 Binary File Format. This includes: LibreOffice Impress, Apache OpenOffice Impress, Corel WordPerfect Office, Google Slides, Apple's Keynote, and Adobe Captivate.

The binary PPT format sometimes appears on lists of acceptable formats for archiving, based on its wide use; it does not usually occur as a preferred format. For example, see the National Archives of Australia and the U.S. National Archives (NARA).

A number of utilities and software libraries for examining and manipulating PPT files exist. oletools is a package of python tools to analyze OLE and MS Office files, with one important objective to be to detect characteristics found in malicious files. Weaponized MS Office 97-2003 legacy/binary formats (doc, xls, ppt, ...) and OLE Compound File from the Forensics Wiki also list some software libraries that can work with the formats based on the Compound File Binary format. The compilers of this resource have not determined the extent to which any of the software listed in these resources is actively maintained. Comments welcome.

Widely used commercial file conversion packages that can read and write PPT files include: LeadTools; and Aspose Slides. Aspose also has a free online viewer at https://products.aspose.app/slides/viewer. A large number of other file conversion utilities claim to convert PPT files to PPTX online at no charge, including: Free online converter from Aspose; Zamzar; OnlineConvert; Convertio; and docspal.

    Licensing and patents

Covered by Microsoft's Open Specification Promise, whereby Microsoft "irrevocably promises" not to assert any claims against those making, using, and selling conforming implementations of any specification covered by the promise (so long as those accepting the promise refrain from suing Microsoft for patent infringement in relation to Microsoft's implementation of the covered specification).

New features introduced into PPT may be subject to patent protection. However, Microsoft's interoperability principles indicate "Microsoft will also make available a list of any of its patents that cover any extensions, and will make available patent licenses on reasonable and non-discriminatory terms." As of November 2019, the patent map tool provided by Microsoft indicated that there no patents of concern to users of the [MS-PPT] specification or the [MS-CFB] specification on which it is based.

Transparency

The PPT format is not easily interpreted with basic tools.

In a PPT file without encryption or password-protection and in the Latin alphabet, the text characters of slides or notes will usually be seen in a Hex dump of the PowerPoint Document stream. Characters may be stored in 1-byte (Extended ASCII) encodings, typically in Windows code page 1252, or in UTF-16. Formatting information is stored separately.

Self-documentation

Options for storing document-level metadata in a PPT file are described in a subsidiary specification, [MS-OSHARED]. A PPT file should include a Summary Information property set, which can include the following optional descriptive properties: Title, Author, Subject (description), Keywords, Comments. See 2.25.1 SummaryInformation and 2.3.3.2.1.1 PIDSI. A PPT file may contain an additional property set (known as Document Summary Information) with a fixed set of optional properties, including Manager and Company names. See 2.3.3.2.2.1 PIDDSI (Document Summary Information property set. User-defined or custom properties are also allowed. See also 2.3.3 Property Set Storage.

The [MS-PPT] specification offers no support for embedding metadata in an externally defined schema in a way that will be recognized by Microsoft PowerPoint.

External dependencies A PPT file may be designed to incorporate resources from external data sources, for example, for charts or graphics that are generated dynamically or regularly updated.
Technical protection considerations Since PPT files may contain sensitive information that needs to be protected, they can be protected by password protection and encryption. A single encryption approach is supported, as described at 2.3.7 CryptSession10Container in [MS-PPT]. This subclause indicates that the PPT format may only use Office binary RC4 CryptoAPI encryption. This subclause also indicates which sections of a PPT file should or should not be encrypted.

Quality and functionality factors Explanation of format description terms

Still Image
Normal rendering

No specific set of factors for assessing quality and functionality of a presentation format has been developed.

See PPTX/OOXML_2012 file, except that features added to PowerPoint since 2007, as documented in [MS-PPTX]: PowerPoint Extensions to the Office Open XML (.pptx) File Format may not be supported.

Text
Normal rendering See PPTX/OOXML_2012 file, except that features added to PowerPoint since 2007, as documented in [MS-PPTX]: PowerPoint Extensions to the Office Open XML (.pptx) File Format may not be supported.
Integrity of document structure See PPTX/OOXML_2012.
Integrity of layout and display See PPTX/OOXML_2012.
Functionality beyond normal rendering See PPTX/OOXML_2012.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension ppt
Documented in the specification and elsewhere by Microsoft in many locations, for example at Office 2007 File Format MIME Types for HTTP Content Streaming.
Internet Media Type application/vnd.ms-powerpoint
Documented by Microsoft at Office 2007 File Format MIME Types for HTTP Content Streaming. Also listed at IANA; see 1996 registration correspondence at https://www.iana.org/assignments/media-types/application/vnd.ms-powerpoint
Magic numbers Hex: D0 CF 11 E0 A1 B1 1A E1
Documented in the CFB specification, in 2.2 Compound File Header. Applies to all files in CFB format; see GCK'S File Signatures Table entry for Compound Binary File format (aka OLECF).
File signature Hex: 3E 00 03 00 FE FF 09 00
At byte offset 24 from beginning of file. Indicates CFB (Compound File Binary format) major version 3, minor version 3e. Assumes that all PPT files use this version of CFB. Comments welcome.
File signature Hex: 50 00 6F 00 77 00 65 00 72 00 50 00 6F 00 69 00 6E 00 74 00 20 00 44 00 6F 00 63 00 75 00 6D 00 65 00 6E 00 74 00
Represents the string "PowerPoint Document" in UTF-16. Occurs in mandatory Root Entry directory. Used in PRONOM entry for PUID: fmt/126. May also match earlier CFB-based PowerPoint formats.
File signature Hex: 5F C0 91 E3
Occurs in the CurrentUserAtom in the mandatory CurrentUser stream; see 2.3.2 CurrentUserAtom in the [MS-PPT] specification. The documentation released by Microsoft in 2007 for PowerPoint 97-2007 indicates that the sequence is a "magic number" to ensure this is an unencrypted PowerPoint file. An encrypted PPT file has a different sequence on the same location. May also match earlier CFB-based PowerPoint formats.
Pronom PUID fmt/126
PRONOM has a number of entries for Microsoft PowerPoint format variants with the .ppt extension. The PRONOM entry that corresponds to the scope of this format description is http://www.nationalarchives.gov.uk/PRONOM/fmt/126.
Wikidata Title ID Q24834502
See https://www.wikidata.org/wiki/Q24834502.

Notes Explanation of format description terms

General

Version identification: Precise identification of this version of the format with the .ppt extension appears complex. The compilers of this resource have not located a single source that appears to be complete. Below is a list of clues that can contribute to format identification.

  • Identification of CFB as wrapper: Magic number at byte offset 0 with Hex string D0CF11E0A1B11AE1
    Applies to all CFB files.
  • Identification of CFB version: At byte offset 24 from beginning of file, Hex string 3E000300FEFF0900
    Indicates CFB (Compound File Binary format) major version 3, minor version 3e. Assumes that all PPT files use this version of CFB. All files in the Microsoft binary formats with .doc, .ppt, and .xls extensions observed by compilers of this resource (using Hex dumps on a small selection of files known to have been written by different versions of Microsoft Office applications) have used this CFB version.
  • Identification of .ppt file as an unencrypted PowerPoint file: At variable physical location, within the Current User stream. Hex string 5FC091E3
    May also apply to earlier versions of CFB-based PowerPoint files.
  • Representation of "PowerPoint Document" in UTF-16. At variable physical location, in mandatory Root Entry directory. Hex string 50006F0077006500720050006F0069006E007400200044006F00630075006D0065006E007400
    This is used as a signature in the PRONOM entry for PUID fmt/126. May also apply to earlier versions of CFB-based PowerPoint files.
  • Version fields in Current User stream: The [MS-PPT] specification includes mandatory values for three adjacent version identifiers. Hex string F4030300
    See specification at 2.3.2 CurrentUserAtom and example at 3.2 File Structure. May also apply to earlier versions of CFB-based PowerPoint files.
  • Existence of current user name in ANSI and in UTF-16: In Current User stream. This is the only clue found by the compilers of this resource that appeared to distinguish clearly between a file created in a version of PowerPoint for Windows prior to PowerPoint 97 and more recent versions. Unfortunately, it appears possible to create a presentation in PowerPoint 2019 for Mac without an actual user name for the current user.

The compilers of this resource would welcome comments based on more extensive investigation of CFB-based PowerPoint files.

Security threats: Weaponized MS Office 97-2003 legacy/binary formats (doc, xls, ppt, ...) lists some general threats for Microsoft binary formats, including the ability to embed Flash objects and macros.

Custom XML feature in PPT format: The ability to store custom data in user-defined XML was added to Office applications in Office 2007. This feature was known as "Custom XML" and support for embedding Custom XML was added to the PPT format. According to Custom XML Data from 2013, the "capability wasn't used very much, but when it was used it was usually by add-ins or macros, not by end users." The compilers of this resource have not determined the degree to which this feature has been used in PPT files. Comments welcome.

History

The origin of PowerPoint was in 1984 at Forethought Inc. founded by Robert Gaskin, who sold the company to Microsoft in 1987. See chronology at A Brief History of PowerPoint Told in its Own Words. See also The dream boss? What it was like to work for Bill Gates from Zamzar for an interview with Gaskin on the early history of the Microsoft PowerPoint application. The Wikipedia entry for Microsoft PowerPoint provides a chronology of different PowerPoint formats that used the extension .ppt.

The version of Microsoft's PPT format described here was introduced in 1997 and was the default file format for PowerPoint until 2007. Starting with PowerPoint 2007, the default format for PowerPoint presentations became PPTX/OOXML.


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 12/08/2020