Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

Microsoft Compound File Binary File Format, Version 3

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Microsoft Compound File Binary File Format, Version 3
Description

Microsoft Compound File Binary (CFB) file format is also known as the Object Linking and Embedding (OLE) or Component Object Model (COM) structured storage compound file implementation binary file format.  CFB implements a simplified file system through a hierarchical collection of storage objects and stream objects. A storage object is comparable to a file system directory in that just as a directory can contain other directories and files, a storage object can contain other storage objects and stream objects. A parent storage object can also track the locations and sizes of the child storage object and stream objects nested beneath it. A stream object is comparable to a file in that a stream contains user-defined data stored as a consecutive sequence of bytes. A compound file consists of the root storage object with optional child storage objects and stream objects in a nested hierarchy.

The purpose of structured storage is to reduce the performance penalties and overhead associated with storing separate objects in a flat file. Structured storage solves performance problems by eliminating the need to totally rewrite a file whenever a new object is added, or an existing object increases in size. The new data is written to the next available free location in the file, and the storage object updates an internal structure that maintains the locations of its storage objects and stream objects. At the same time, structured storage enables end users to interact and manage a compound file as if it were a single file rather than a nested hierarchy of separate objects.

There are two active versions of CFB, version 3 and version 4. One major distinction between the versions is that the sector size for version 3 is of 512 bytes and the sector size for version 4 is 4096 bytes.

A compound file is divided into equal-length sectors, the smallest addressable unit of a disk. The first sector contains the compound file header. Subsequent sectors are identified by a 32-bit non-negative integer number, called the sector number. A group of sectors can form a sector chain, which is a linked list of sectors forming a logical byte array, even though the sectors can be in non-consecutive locations in the compound file. The main structure used to manage sector allocation and sector chains is the file allocation table (FAT).The FAT contains an array of 32-bit sector numbers, where the index represents a sector number, and its value represents the next sector in the chain, or a special value. This allows a compound file to contain many sector chains in a single file.

The known size of all structures within a compound file must be specified when the compound file is transmitted or retrieved. For this reason, CFB is not recommended for real-time streaming, progressive rendering, or open-ended data protocols where the size of streams is unknown at the time of transmission.

The minimum size of a compound file is three sectors: one header, one FAT sector and one directory sector.

  • A 512-byte sector compound file MUST be no greater than 2 GB in size for compatibility reasons. This means that every stream, including the directory entry array and mini stream, inside a 512-byte sector compound file must be less than 2 GB in size.
  • The maximum number of directory entries (storage objects, stream objects, and unallocated objects) in a 512-byte sector compound file is limited by the 2 GB file size, resulting in slightly less than 16 million directory entries.
  • The maximum size of the mini stream is slightly less than 256 GB. The maximum size of the mini stream in a 512-byte sector compound file is limited by the 2 GB file size.

The structured storage profile of CFB formed the basis for the AAF specification.

Relationship to other formats
    Has subtype MSG, Microsoft Outlook Item
    Has later version CFB_4, Microsoft Compound File Binary File Format, Version 4
    Affinity to AAF_1_1, Advanced Authoring Format (AAF) Object, Version 1.1.

Early versions of the AAF format detailed use of the structured storage systems outlined in CFB to store the objects on disk.


Local use Explanation of format description terms

LC experience or existing holdings  
LC preference  

Sustainability factors Explanation of format description terms

Disclosure Fully documented. Proprietary file format developed by Microsoft.
    Documentation Microsoft [MS-CFB]: Compound File Binary File Format specification available from Microsoft.
Adoption CFB is implemented in a wide range of Microsoft products including Office for Mac 1998 - 2008 and Windows operating systems NY 4.0 - 8.1.
    Licensing and patents CFB is a Microsoft product and may be covered by licenses and patents. CFB is not covered by Microsoft's Open Specification Promise or Community Promise.
Transparency Dependant on the implementation.
Self-documentation None
External dependencies None
Technical protection considerations

Because a compound file is stored as a single file in the file-system, normal file-system security mechanisms can be used to secure the compound file. This includes read/write permissions, Access Control List (ACL), and encryption (NTFS EFS or BitLocker) where appropriate.


Quality and functionality factors Explanation of format description terms


File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension Not applicable.  Depends on subtype
File signature Hex: D0 CF 11 E0 A1 B1 1A E1
Hex: 0x0003 (version 3)
From specification. This specification applies to the all Microsoft content types that share the general CFB structure.

Notes Explanation of format description terms

General

In addition to the Major Version field value declaration of the version number in the header, the Sector Shift field specifies the sector size depending on the version declaration. If Major Version is 3, then the Sector Shift must be 0x0009, specifying a sector size of 512 bytes.

In a compound file, all integer fields, including Unicode characters encoded in UTF-16, must be stored in little-endian byte order. The only exception is in user-defined data streams, where the compound file structure does not impose any restrictions.

History  

Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: Monday, 27-Feb-2017 09:55:24 EST