Sustainability of Digital Formats: Planning for Library of Congress Collections |
|
Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact |
Full name | Microsoft Outlook 2003 Personal Folders File (Unicode) |
---|---|
Description |
The Personal Folders File or PST is an open proprietary data file format used to store local copies of messages, calendar events, and other items within Microsoft software including Microsoft Office Outlook. PST files are used to store archived items and to maintain off-line availability of the items. See PST_ANSI for a description of general PST structure and characteristics. The two versions of PST, PST_ANSI and PST_Unicode, are differentiated primarily by software implementation versions, character sets, maximum file size constraints and bit values. PST_Unicode is the default format used by Office Outlook versions starting with Outlook 2003 and includes Outlook 2007, Outlook 2010 and Outlook 2013. It employs the Unicode character set. The file size constraints for PST_Unicode are significantly larger than the PST_ANSI overall size limit of 2 gigabytes (GB). PST_Unicode can support file sizes up to 20 GB in Outlook 2003 and Outlook 2007 and file sizes up to 50 GB for Outlook 2010 and Outlook 2013. According to Microsoft, these file size limits can be extended but would negatively impact performance. PST_Unicode uses 64-bit values to represent block IDs (BIDs) and byte index (IB). |
Production phase | PST files provide a mechanism for the centralized storage of email folders, email messages, their attachments, contacts, calendar items, etc. |
Relationship to other formats | |
Has earlier version | PST_ANSI, Microsoft Outlook PST 97-2002 (ANSI) |
Affinity to | TNEF, Transport Neutral Encapsulation Format |
LC experience or existing holdings | The Library of Congress includes PST Unicode and PST ANSI files in its collections, especially in the Manuscripts and Music Divisions as well as other personal papers repositories. |
---|---|
LC preference | The Library of Congress Recommended Formats Statement (RFS) lists PST as an acceptable format for Email: For aggregated groups of messages. The RFS does not specify a version of PST. |
Disclosure | Fully documented. Proprietary file format developed by Microsoft. |
---|---|
Documentation | Microsoft [MS-PST]: Outlook Personal Folders (.pst) File Format specification available from Microsoft. See Format Specifications below. |
Adoption |
The Outlook .pst files are used for POP3, IMAP, and HTTP accounts and are supported by several Microsoft client applications, including Microsoft Exchange Client, Windows Messaging, and Microsoft Office Outlook. Outlook 2003, Outlook 2007, Outlook 2010 and Outlook 2013 can read, write, and create both ANSI and Unicode PST files. By 2010 (when the specification was made public by Microsoft), PST_ANSI was considered a legacy format with a recommendation that it not be used to create new PST files. The default format was declared to be PST_Unicode. PST_Unicode files are not compatible with Microsoft Outlook 97-2002 which read PST_ANSI files only. At least two open-source software libraries have been developed to examine and manipulate PST files: libpff, a library (in C, with python bindings partially implemented as of late 2013) to access PST and related formats; PST File Format SDK, a cross-platform C++ library for reading PST files, developed under Microsoft auspices through a 2009-2010 project. According to Microsoft, Outlook .PST files are supported in OneDrive but "they are synced less frequently compared to other file types to reduce network traffic." If users "enable PC folder backup (Known Folder Move) manually without the group policy, they will see an error if they have a .PST file in one of their known folders (e.g. Documents). If Known Folder Move is enabled and configured via group policy, .PST files will be migrated." |
Licensing and patents | See PST_ANSI |
Transparency | See PST_ANSI |
Self-documentation |
The PST format version is declared in the file header. According to the specification, the wVer field for a PST_Unicode file must have a value of 23. Folder objects, message objects, and attachment objects all have properties which include the header fields users typically see in an email application as well as many properties relating to the status, management, and history of the object in an Outlook application. A message object also has a recipients table that identifies each recipient and may have an attachments table that lists and identifies attachments. |
External dependencies | None |
Technical protection considerations | See PST_ANSI |
Text | |
---|---|
Normal rendering | PST_Unicode can only represent UTF-16 strings (Unicode character encoding). |
Integrity of document structure |
At the physical level, the file starts with a header, followed by an optional density list, and then a series of mapping structures interspersed at set intervals between blocks of data. The mapping structures are of fixed size, and repeat as often as needed to encapsulate areas of data as the file grows. At the logical level, a .pst file has three layers: the Node Database (NDB) layer, the Lists, Tables, and Properties (LTP) layer, and the Messaging layer. An important structural improvement of PST_Unicode over PST_ANSI is that PST_Unicode files contain additional FPMap pages in addition to the initial FPMap in the HEADER, thereby extending their size limit beyond the 2 GB size limit demonstrated in PST_ANSI files. The semantic structure of messages (with their headers) in folders and attachments linked to messages is represented in the Messaging layer. Since this format is designed for active use in an email system as a stand-alone message store, the full semantics required and/or observed in the system that generated the file is represented. |
Tag | Value | Note |
---|---|---|
Filename extension | See related format. | See PST_ANSI |
Internet Media Type | See related format. | See PST_ANSI |
Magic numbers | See related format. | See PST_ANSI |
File signature | Hex: 53 4D 17 00 Hex: 53 4D 15 00 |
Offset 8 bytes from start of file. In conjunction with the magic number at the beginning of the file, this identifies that the file is a PST file using the PST_Unicode version. The 0x17 value is much more frequently found. According to Metz in Personal Folder File (PFF) file format specification: Analysis of the PFF format, the 0x15 value is believed to indicate the same format as 0x17 value (i.e. PST_Unicode) and was found in an 64-bit PST file created by the software Visual Recovery for Exchange Server but it is not common. |
File signature | x-fmt/249 |
PRONOM entry for Microsoft Outlook Personal Folders (Unicode). Identification based on internal signifier. |
Wikidata Title ID | Q1480633 |
See https://www.wikidata.org/wiki/Q1480633. Wikidata does not distinguish between versions of PST. |
General | See PST_ANSI |
---|---|
History |
|