Sustainability of Digital Formats: Planning for Library of Congress Collections |
|
![]() |
|
Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact |
Full name | Microsoft Compound File Binary File Format, Version 3 |
---|---|
Description |
Microsoft Compound File Binary (CFB) file format is also known as the Object Linking and Embedding (OLE) or Component Object Model (COM) structured storage compound file implementation binary file format. CFB implements a simplified file system through a hierarchical collection of storage objects and stream objects. A storage object is comparable to a file system directory in that just as a directory can contain other directories and files, a storage object can contain other storage objects and stream objects. A parent storage object can also track the locations and sizes of the child storage object and stream objects nested beneath it. A stream object is comparable to a file in that a stream contains user-defined data stored as a consecutive sequence of bytes. A compound file consists of the root storage object with optional child storage objects and stream objects in a nested hierarchy. The purpose of structured storage is to reduce the performance penalties and overhead associated with storing separate objects in a flat file. Structured storage solves performance problems by eliminating the need to totally rewrite a file whenever a new object is added, or an existing object increases in size. The new data is written to the next available free location in the file, and the storage object updates an internal structure that maintains the locations of its storage objects and stream objects. At the same time, structured storage enables end users to interact and manage a compound file as if it were a single file rather than a nested hierarchy of separate objects. There are two active versions of CFB, version 3 and version 4. One major distinction between the versions is that the sector size for version 3 is of 512 bytes and the sector size for version 4 is 4096 bytes. A compound file is divided into equal-length sectors, the smallest addressable unit of a disk. The first sector contains the compound file header. Subsequent sectors are identified by a 32-bit non-negative integer number, called the sector number. A group of sectors can form a sector chain, which is a linked list of sectors forming a logical byte array, even though the sectors can be in non-consecutive locations in the compound file. The main structure used to manage sector allocation and sector chains is the file allocation table (FAT). The FAT contains an array of 32-bit sector numbers, where the index represents a sector number, and its value represents the next sector in the chain, or a special value. This allows a compound file to contain many sector chains in a single file. The known size of all structures within a compound file must be specified when the compound file is transmitted or retrieved. For this reason, CFB is not recommended for real-time streaming, progressive rendering, or open-ended data protocols where the size of streams is unknown at the time of transmission. The minimum size of a compound file is three sectors: one header, one FAT sector and one directory sector.
A file in version 3 of the CFB format begins as follows with a 512-byte header:
The structured storage profile of CFB formed the basis for the AAF specification and for the Microsoft Office Binary Formats that were the default formats for Word, PowerPoint, and Excel from products released in 1997 through 2004 (MS-DOC, MS-PPT, and MS-XLS). |
Relationship to other formats | |
Has subtype | MSG, Microsoft Outlook Item |
Has subtype | MS-DOC, Microsoft Office Word 97-2003 Binary File Format (.doc) |
Has subtype | MS-PPT, Microsoft Office PowerPoint 97-2003 Binary File Format (.ppt) |
Has subtype | MS-XLS, Microsoft Office Excel 97-2003 Binary File Format (.xls, BIFF8) |
Has later version | CFB_4, Microsoft Compound File Binary File Format, Version 4 |
Affinity to | AAF_1_1,
Advanced Authoring Format (AAF) Object, Version 1.1. Early versions of the AAF format detailed use of the structured storage systems outlined in CFB to store the objects on disk. |
Affinity to | WPD, WordPerfect Document Family. According to WPD from Archiveteam.org, WordPerfect version 7 can also store documents known as "WordPerfect Compound File" using the Microsoft OLE Compound file format with the same WPD extensions. OLE embedded objects are stored inside a storage called PerfectOffice_OBJECT, whereas the real document part is now stored as stream PerfectOffice_MAIN. In principal the format of this internal document part is the same like in previous versions, but one difference is that the minor version number is raised from 1 to 2. |
LC experience or existing holdings | See various subtypes for holdings information. |
---|---|
LC preference | See the Recommended Formats Statement for the Library of Congress format preferences. |
Disclosure | Fully documented. Proprietary file format developed by Microsoft. |
---|---|
Documentation | Microsoft [MS-CFB]: Compound File Binary File Format specification, available from Microsoft. |
Adoption | CFB is implemented in a wide range of Microsoft products including Office for Mac 1998 - 2008 and Windows operating systems NY 4.0 through Windows 10. See Exploring the Compound File Binary Format from 2009, which describes CFB as "the bread and butter for the Microsoft Office suite of applications for many years." |
Licensing and patents | CFB is covered by Microsoft's Open Specification Promise. |
Transparency | Depends on subtype or implementation. |
Self-documentation | None |
External dependencies | None |
Technical protection considerations |
Because a compound file is stored as a single file in the file-system, normal file-system security mechanisms can be used to secure the compound file. This includes read/write permissions, Access Control List (ACL), and encryption (NTFS EFS or BitLocker) where appropriate. Some subtypes permit encryption and password protection of specified streams. See, for example, [MS-OFFCRYPTO], used for Microsoft Office Binary File Formats. |
Tag | Value | Note |
---|---|---|
Filename extension | Not applicable. | Depends on subtype |
Internet Media Type | See note. | Depends on subtype |
Magic numbers | Hex: D0 CF 11 E0 A1 B1 1A E1 |
Documented in the CFB specification, in 2.2 Compound File Header. Applies to all files in CFB format; see GCK'S File Signatures Table entry for Compound Binary File format (aka OLECF). |
File signature | Hex: 3E 00 03 00 FE FF 09 00 |
At byte offset 24 from beginning of file. Documented in specification at 2.2 Compound File Header. This sequence indicates CFB (Compound File Binary format) major version 3, minor version 3e. The specification states that the minor version should always be indicated as 3e. |
General |
In addition to the Major Version field value declaration of the version number in the header, the Sector Shift field specifies the sector size depending on the version declaration. If Major Version is 3, then the Sector Shift must be 0x0009, specifying a sector size of 512 bytes. In a compound file, all integer fields, including Unicode characters encoded in UTF-16, must be stored in little-endian byte order. The only exception is in user-defined data streams, where the compound file structure does not impose any restrictions. |
---|---|
History |
|