Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

Rich Text Format (RTF) Family

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Rich Text Format (RTF) Family
Description

The Rich Text Format (RTF) is a proprietary document file format developed and maintained through several versions by Microsoft Corporation through 2008 for cross-platform document interchange among Microsoft products. In 1992, Microsoft published version 1.0 of the RTF specification, the first of many 1.x versions introduced to handle new features of Microsoft applications, particularly Microsoft Word. However, Microsoft used earlier versions of RTF before that. An early description of the RTF format was in the Microsoft Systems Journal (MSJ) issue of March 1987 in "Rich Text Format Standard Makes Transferring Text Easier" by Nancy Andrews. Between 1987 and 2008, Microsoft extended the RTF specification several times to add support for new features in Microsoft Word. See Notes below for a chronology of versions. Version 1.9.1 was published in March 2008 and is the last release, compatible with Word 2007. The intent of RTF was to represent the full content of a Word .doc file, including not just formatted text, but also embedded images, comments, tracked changes, etc. In Information about the Rich Text Format (RTF) version specifications for various versions of Word, Microsoft associates RTF version numbers with versions of Microsoft Word. The specification was occasionally extended to support content representation associated with other applications, primarily Outlook and Exchange, in which messages can be encoded as HTML, RTF, or plain text.

The Wikipedia entry for Rich Text Format states. "The Rich Text Format was the standard file format for text-based documents in applications developed for Microsoft Windows." The WinHelp format, used for Windows help files for many years before it was deprecated in 2006, employed RTF for textual content. See Wikipedia entry for WinHelp and MSHelpWiki: WinHelp. Microsoft has offered the option of storing mail messages in RTF in its mail applications; as of 2017, this option is still supported but discouraged. WordPad, introduced by Microsoft as part of Windows 95, supports RTF natively. RTF has been widely used by developers of applications for Windows for transferring formatted text between applications or for input of formatted text by users. See, for example, How to: Convert RTF to Plain Text (C# Programming Guide), which states, "In the .NET Framework, you can use the RichTextBox control to create a word processor that supports RTF and enables a user to apply formatting to text in a WYSIWIG manner." The Cocoa framework used in Mac OS also uses RTF as the basis for support of user entry of formatted text. Most word-processors and a wide variety of other non-Microsoft applications support import and export of formatted text using RTF. See Adoption (in Sustainability Factors) and Useful References below.

Since the introduction of Word 2010, use of RTF as a file format by Microsoft has been declining. According to a 2010 technical note from Microsoft, "The RTF file format is no longer enhanced to include new features and functionality. Features and functionality that are new to Word 2010 and future versions of Word are lost when they are saved in RTF." Microsoft states in the specification for RTF 1.9.1, "RTF allows documents to migrate forward and backward in time: old readers can read the most recent RTF and new readers can read old RTF ... Files created with an earlier version of Word using RTF should be read without problem by newer versions of Word. Older versions of Word ignore control words and groups they don’t understand." The Save As operation in Word 2016 includes RTF an option, but states that saving a Microsoft Office Word document in RTF "does not reliably preserve the formatting, layout, or other features of the document." According to Document format support in the new Word apps and Word 2016 from the Ctrl Blog, RTF files cannot be read in Word Online or the Word apps for tablets.

RTF files are encoded using plain text, usually using 7-bit ASCII, with runs of text with non-ASCII characters requiring conversion to appropriate code values. An RTF file comprises a series of control words, control symbols, and groups. A control word is a sequence of letters preceded by a backslash and terminated by a space, digit, or any non-alphabetic character. Control words support the representation of the structure, formatting, and layout of document text. For example, \b turns on bold and \b0 turns off bold while \hl and \footer introduce the representation for a hyperlink and a page footer, respectively. A control symbol consists of a backslash followed by a single, nonalphabetic character. For example, \~ represents a nonbreaking space. A group is enclosed in braces ({ }) and consists of text together with control words or control symbols. A group specifies the text affected by the group and the different attributes of that text. An RTF file is not necessarily divided into lines. ASCII line terminators (carriage return, line feed characters) may be used for readability.

An RTF file has a Header section, beginning with \rtf0 or \rtf1, followed by a Document section. The digit (0 or 1) terminating the introductory control word of the Header represents the major version of the governing RTF specification and will usually be 1. The next control word specifies the character set used in the file as one of \ansi (the default), \mac, \pc, or \pca. From Word 97 onward, Word has had support for Unicode, although Word was not fully based on Unicode until Word 2002. For versions 1.5 and later of RTF, use of Unicode has been facilitated by conversion to a specified ANSI code page when writing an RTF file. The code page to use for conversion from the ANSI encoding in the RTF file to Unicode, is indicated in a control word \ansicpgN, e.g., \ansicpg1252 for the code page commonly used for English and other Western European languages in Windows. The rest of the Header comprises groups containing information about fonts, styles, and overall document-formatting properties.

The Document section begins with an optional group, identified by the \info control word, that introduces document properties (metadata). Pre-defined properties include title, author, keywords, etc., as well as time of creation, number of pages, number of words, etc. User-defined properties are also supported in the \info group. Following the \info group are groups that apply to the document as a whole, such as settings and styles. A typical document will then have a sequence of high-level groups that represent one or more sections in the document. Each section may begin with a \hdrftr group holding details for the page header and footer for the section. Nested within a section is a sequence of paragraphs (indicated by control words \par or \pard). A table is considered a particular type of paragraph. Nested within a regular paragraph will be a sequence of groups representing runs of character text with control words for associated formatting, font choice, etc. A small example file is in Notes below. The ASCII text content in an RTF file is in reading order and simple to extract for indexing, as pointed out by Witten et al. in "How to Build a Digital Library" on page 202.

As well as being used for document interchange among Microsoft applications, RTF is widely supported for import and/or export by other word-processing software, desktop publishing applications, and other applications that benefit from the import of formatted text. However, such support may be based on a subset of control words or a chronological version other than the latest (version 1.9.1). See Notes below.

Production phase Although an RTF document can be created or edited directly in a text editor, the primary use of RTF is as a middle-state format for interchange between applications or as a final-state format.

Local use Explanation of format description terms

LC experience or existing holdings The Library of Congress has acquired RTF files in collections of personal and organizational papers and via other deposits. The total number of RTF files as of December 2017 is around 25,000 (in comparison with over 15 million PDF files).
LC preference The Library of Congress Recommended Formats Statement (RFS) for textual documents includes Rich Text Format (RTF) as an acceptable digital format, but with low priority compared with XML-based formats and PDF/A.

Sustainability factors Explanation of format description terms

Disclosure

RTF is a proprietary format developed by Microsoft, primarily to allow formatted textual content to be exchanged among Microsoft applications. Microsoft published many versions of the specification between 1992 and 2008 in order to handle features introduced with new versions of Microsoft Word. The latest version was published in association with Word 2007.

    Documentation

Version 1.9.1 of the RTF specification, the latest version, is available for download from Microsoft.

Microsoft has not kept an archive of past versions. Several entities have attempted to compile archives, in particular, Snake.net and the developers of the latex2rtf tool. See Format Specifications below for links to versions found in various online sources.

Adoption

As described above, RTF has been widely used for exchange of formatted text among Microsoft applications. RTF has also been widely supported by non-Microsoft applications. On his Interglacial site, RTF expert Sean M. Burke says, "RTF is a document language used for exchanging text between different word processors and text-processing applications. RTF is much easier to generate than PDF or PostScript, and is more word-processor friendly than HTML. RTF has been around for over twenty years, while hundreds of other binary formats have come and gone." Similarly, a Tutorial on RTF, created around 2010, points out, "When posting documents to the Web, it is always important to supply alternative file formats. In addition to '.doc,' you should consider formats such as HTML, PDF, and RTF. Of the three, RTF is perhaps the simplest to create."

In addition to support for RTF files in Microsoft applications, most other word-processing and desk-top publishing applications can import and/or export RTF files, including: Adobe Illustrator (import); Adobe InDesign (import and with options to export one story or all stories; LibreOffice (import and export); and Corel's WordPerfect (import and export). Lifewire | What is an RTF File? lists several other word-processing applications that can open RTF files, including Google Docs, which can import and export RTF documents. The degree of support may vary, depending on which chronological version of RTF the support is based on. The version supported may not be obvious from product documentation.

Many applications whose main function takes advantage of formatted text support RTF for import or export. Examples include: IBM's Rational DOORS, a requirements management tool; Maple and Mathematica for mathematical analysis and computation; and R for statistical analysis. See Useful References below for more examples.

Conversion tools that support RTF for import or export include: Adobe Reader DC; UnRTF; and Aspose.Words, which has software toolkits for .NET and Java that support load and save operations for a number of textual formats, including RTF. See Useful References below for more examples.

Recommendations for use of RTF for submission and distribution of documents include: How to File Docket Submissions (U.S. Department of Transportation) and How to Format Tender Submissions for the [UK] Department for Transport from 2012. The Blurb self-publishing service asks for novels to be submitted in RTF; see Preparing your Content.

RTF is listed as acceptable for long-term archiving by some archives, including: Archivematica; [email protected]; and the Deep Blue Repository at the University of Michigan.

    Licensing and patents Microsoft states in the latest RTF specification, "RTF serves as both a standard of data transfer between word processing software, document formatting, and a means of migrating content from one operating system to another." The compilers of this resource infer from this that, although there may be patents associated with features described in the RTF specification, Microsoft has encouraged the use of the specification to build tools that can read or write RTF files. Microsoft does claim copyright, with all rights reserved, on the specification document, but has not objected to the copies of old versions archived by tool developers.
Transparency Can be opened and viewed in text editors. Due to the density of markup information in many RTF files and the unintuitive nature of many RTF control words, users viewing RTF files in a text editor will often not be able to identify the raw textual content. However, the RTF structure does enable simple programmatic extraction of the underlying text in reading order. RTF documents may include hex-encoded binary data, e.g., for images.
Self-documentation The optional \info group can hold document properties (metadata) in named elements used in Microsoft applications. User-defined properties can also be incorporated.
External dependencies None.
Technical protection considerations The RTF specification provides no internal provision for encryption or other technical protection to prevent reading the file. Password protection to prevent editing is supported through the control word \passwordhash, which introduces hex-encoded encrypted data representing the password needed to edit the given RTF document. The compilers of this resource are unable to determine whether this feature has been widely used in practice or impedes use beyond a Windows context. Comments welcome.

Quality and functionality factors Explanation of format description terms

Text
Normal rendering Document with markup that offers support for powerful word-processing functionality, particularly when used with Microsoft Word. Textual content is editable in an ASCII text editor and textual content is conveniently extractable for indexing. Rendering or indexing of non-ASCII characters requires intermediate processing. RTF does not use a UTF encoding.
Integrity of document structure Paragraphs and sections are easily recognized, as are headers and footers. Support is available for higher-level constructs through styles (e.g., for headings), automatically generated tables of contents and indexes, structured templates, and mail merge.
Integrity of layout and display Represents entire layout and formatting as intended by an author using Microsoft Word through Word 2007. Features added to Word in later versions may not be supported. Bi-directional and vertical display of text can be specified. Differences in detail can occur on display if the original fonts used are not available in the system used for viewing.
Support for mathematics, formulae, etc. The control words used to define the display of mathematical notation mirror elements in the Office Math Markup Language (OMML), which is a part of Office Open XML (see DOCX/OOXML_2012).
Functionality beyond normal rendering Can store information associated with the process of creating and reviewing documents, including tracked changes and other annotations. RTF files may include forms designed to be filled in by a reader.

File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension rtf
 
Internet Media Type application/rtf
text/rtf
See https://www.iana.org/assignments/media-types/application/rtf (2007) and https://www.iana.org/assignments/media-types/text/rtf (1993).
Magic numbers {\rtf
From specification. Applies to all RTF versions.
Mac OS file type RTF.
From RTF specification.
Pronom PUID See note.  Pronom does not have an entry and signature for the entire RTF family, but does have specific signatures for some chronological versions of RTF. See below.
Pronom PUID fmt/969
RTF, unnumbered version, prior to 1.0 See https://www.nationalarchives.gov.uk/PRONOM/fmt/969.
Pronom PUID fmt/45
RTF, versions 1.0-1.4. See https://www.nationalarchives.gov.uk/PRONOM/fmt/45
Pronom PUID fmt/50
RTF, versions 1.5-1.6. See https://www.nationalarchives.gov.uk/PRONOM/fmt/50
Pronom PUID fmt/52
RTF, version 1.7. See https://www.nationalarchives.gov.uk/PRONOM/fmt/52
Pronom PUID fmt/53
RTF, version 1.8. See https://www.nationalarchives.gov.uk/PRONOM/fmt/53
Pronom PUID fmt/355
RTF, version 1.9. See https://www.nationalarchives.gov.uk/PRONOM/fmt/355
Wikidata Title ID Q467454
See https://www.wikidata.org/wiki/Q467454 for the Rich Text Format Family. Wikidata also has entries for individual versions of RTF.

Notes Explanation of format description terms

General

Sample file: Immediately below is a short but complete sample RTF file as viewed in an ASCII text editor, with line breaks added for readability:

{\rtf1\ansi\ansicpg1252\uc1\deff1{\fonttbl
{\f0\fswiss\fcharset0\fprq2 Arial;}
{\f1\fswiss\fcharset0\fprq2 Verdana;}
{\f2\froman\fcharset2\fprq2 Symbol;}} {\colortbl;\red255\green0\blue0;\red0\green255\blue0;\red0\green0\blue255;} {\stylesheet
{\s0\itap0\nowidctlpar\f0\fs24 [Normal];}}
{\*\generator TX_RTF32 14.0.520.501;}
\sectd\pard\itap0\nowidctlpar\plain\f1\fs36
{Hello RTF Wörld\par\f0\fs24\par
with some symbols: ""{\f2\cf1\cb0\chcbpat3\i abc}""\par\par
and some \b bold\b0, \i italic\i0, \ul underlined\ul0 and \strike strikethrough\strike0 text\par\par
some nested styles: {\b\cf1 bold}, {\i\cf2 italic}, {\ul\cf3 underlined}, normal\par\par
and some combined styles: {\b bold+\i italic+\ul underlined} vs. normal.\par\par }}

The example above is a shortened version of one in a message attached to "Writing Your Own RTF Converter", an article by Jani Giannoudis.

Security concerns: Why Should I Use Rich Text Format (RTF)? from Solid Documents, a document conversion provider, says, "RTF does not spread viruses. Microsoft Word macro viruses can present big security problems on the Internet. If you send RTF files instead of DOC files by e-mail, you can ensure that harmful macros won't be inadvertently sent to others, but that most of your formatting will be preserved. Since RTF does not use macros, it cannot hide macros that might contain viruses." However, in recent years security issues have surfaced in handling of imported RTF files in Word and Outlook, based on the ability to embed binary objects in RTF files. One exploit is described in "Using RTF Files as a Delivery Vector for Malware" from December 2015. An InfoWorld article in 2014 also describes a vulnerability. Security updates to various Microsoft applications have addressed the vulnerabilities exposed in 2014 and 2015. See security updates MS14-017 and MS15-033. The compilers of this resource believe that Microsoft has released patches for all currently supported applications affected by known vulnerabilities associated with its handling of RTF files. The compilers of this resource recognize that other applications may have similar vulnerabilities, but are not aware of actual exploits. Comments welcome.

Other notes: RTF support in non-Microsoft applications has not always been complete. In Microsoft RTF Specification Nightmare, Hannes Schmidt argued that support in non-Microsoft applications was frequently incomplete, saying "Have you ever seen a word processor other than Microsoft's own office suite member Word that can import an RTF (Rich Text Format) file properly? I have not. The reason for this lies in RTF's inherent complexity and its strong dependency on Microsoft's internal Word document implementation. The RTF format is basically a 7-bit-safe, serialized version of a Word document's in-memory representation plus some tweaks that ensure backward compatibility with older programs that read RTF files. For every version of Microsoft's office suite there is a corresponding RTF specification."

According to a comment on the 2004 blog post by Hannes Schmidt cited above, the RTF specification is "quite a usefull[sic] source of info for the Word binary format." However, since 2008, Microsoft has provided public documentation of the Word binary (.doc) format. See, for example, Specifications for Digital Formats on this site and Office File Formats from Microsoft.

History

An early description of RTF by Microsoft was in the Microsoft Systems Journal (MSJ) issue of March 1987 in "Rich Text Format Standard Makes Transferring Text Easier" by Nancy Andrews. See index of MSJ articles. Later RTF specifications refer to this March 1987 article as an RTF specification. For specifications that have been located online by the compilers of this resource, see Format Specifications below. A chronology, annotated with a few important developments identified, follows:

  • RTF Version 0 (March 1987). No online copy located.
    First description.
  • RTF Version 1.0 (June 1992)
  • RTF Version 1.2 (undated)
  • RTF Version 1.3 (January 1994) for Word 95.
  • RTF Version 1.4 (1995) for Word 95. No copy located; description at Google books.
  • RTF Version 1.5 (April 1997) For Word 97.
    According to https://www.iana.org/assignments/media-types/application/rtf, "Prior to version 1.5 textual content in RTF was 7-bit only. 1.5 changed this to allow unencoded 8-bit characters." This change was facilitated by the new control word \ansicpgN and a set of control words beginning \u to introduce groups with Unicode characters.
  • RTF Version 1.6 (May 1999) for Word 2000.
    In Word 2000, Microsoft introduced nested tables, i.e. tables within cells of a table.
  • RTF Version 1.7 (August 2001) for Word 2002.
  • RTF Version 1.8 (April 2004) for Word 2003.
  • RTF Version 1.9.1 (March 2008) for Word 2007.

With the introduction of the ECMA-376 OOXML format in Word 2007, Microsoft issued compatibility fixes for Word 2002 and Word 2003, to allow those applications to read .docx files. As a result, many RTF control words introduced for Word 2007 may be interpreted appropriately by earlier versions of Word, rather than ignored.

Since 2008, the RTF specification has been frozen. Hence, for all versions of Word later than Word 2007, RTF files saved from Word are vulnerable to information loss.


Format specifications Explanation of format description terms


Useful references

URLs

Books, articles, etc.

Last Updated: 12/20/2017