Sustainability of Digital Formats: Planning for Library of Congress Collections |
|
![]() |
|
Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact |
Full name | Internet Message Format |
---|---|
Description |
Internet Message Format (IMF) is the standardized ASCII-based syntax required by SMTP for all email message bitstreams used by a message transfer agent, sometimes referred to as a mail transfer agent or MTA, when moving messages between computers. IMF is standardized by RFC 5322. IMF syntax itself does not cover other types of non-text data in email messages such as images, audio or other sorts of structured data which are described in other parts of the MIME document series (RFC 2045, RFC 2046, RFC 2049). IMF requires that messages use only US-ASCII characters and that the characters are divided into lines. A line is a series of characters that is delimited by carriage-return (CR) immediately followed by line-feed (LF). Taken together, these are commonly abbreviated as CRLF. Each line of characters is limited to no more than 998 characters, and is encouraged, for the sake of interoperability, to be no more than 78 characters. An IMF-compliant email message consists of a header section comprised of defined fields followed, optionally, by a body. The header section is a sequence of lines of characters with special syntax as defined in this specification. The body is simply a sequence of characters that follows the header section and is separated from the header section by an empty line (i.e., a line with nothing preceding the CRLF). Header fields are well defined lines beginning with a field name, followed by a colon (":"), followed by a field body, and terminated by CRLF. Header field bodies may have a structured or unstructured syntax. Header fields may appear in any order, and they have been known to be reordered occasionally when transported over the Internet. Selected fields may repeat within the header. Required header fields include:
All other header fields are optional and include: reply-to, to, cc, bcc, message-id, in-reply-to, references, subject, comments, keywords Message bodies are simply lines of US-ASCII characters but with two essential requirements:
|
Relationship to other formats | |
Used by | MBOX, MBOX Email Format |
Used by | EML, Email (Electronic Mail Format) |
Used by | PST_ANSI, Microsoft Outlook 97-2002 Personal Folders File (ANSI) |
Used by | PST_Unicode, Microsoft Outlook 2003 Personal Folders File (Unicode) |
Affinity to | CCA, cc:Mail Archive Email Format |
Affinity to | CPIM, CPIM Instant Message Format. Similar header syntax |
LC experience or existing holdings | Not directly applicable because IMF is a syntax rather than a separate format. However, the Library's collections so contain email formats defined by IMF. See EML, MBOX Family and MSG for examples. |
---|---|
LC preference | See the Recommended Formats Statement for the Library of Congress format preferences for Email content. |
Disclosure | Fully documented |
---|---|
Documentation | IMF is fully documented in RFC 5322 and its antecedents, RFC 2822 and RFC 822. |
Adoption | IMF is the standard syntax defined by IETF for the message bitstream when moving email message from one computer to another. As such, it is highly adopted and interoperable with many tool sets and applications. |
Licensing and patents | None |
Transparency |
IMF files are US-ASCII text so are accessible through plain text processing tools. |
Self-documentation | Metadata is available through the well-structured header fields. |
External dependencies | None |
Technical protection considerations | None |
Tag | Value | Note |
---|---|---|
Filename extension | Not applicable. | See related email formats |
Internet Media Type | message/rfc822 |
This is the common MIME type for all formats based on RFC 822. |
Magic numbers | Not applicable. | See related email formats. |
Pronom PUID | fmt/278 |
See http://www.nationalarchives.gov.uk/PRONOM/fmt/278. |
Wikidata Title ID | Q82721505 |
See https://www.wikidata.org/wiki/Q82721505. |
General | IMF has been developed in step with Simple Mail Transfer Protocol. SMTP is the widely used protocol to send email messages from the authors mail program or email client to the mail server and between servers too. Where SMTP is equivalent to the message envelope, IMF is equivalent to the letter within the envelope. Receiving mail from a server is accomplished using POP or IMAP. |
---|---|
History |
RFC822, published in 1982, established the framework for the header structure and was widely used. Revisions and refinements to this structure include RFC 1123 (1989), RFC 2822 (2001) and most recently RFC 5322 (2008). RFC5233 includes this summary of the changes between RFCs: “One important difference between the obsolete (interpreting) and the current (generating) syntax is that in structured header field bodies (i.e., between the colon and the CRLF of any structured header field), white space characters, including folding white space, and comments could be freely inserted between any syntactic tokens. This allowed many complex forms that have proven difficult for some implementations to parse. Another key difference between the obsolete and the current syntax is that the rule … regarding lines composed entirely of white space in comments and folding white space does not apply. The NUL character (ASCII value 0) was once allowed, but is no longer for compatibility reasons. Similarly, USASCII control characters other than CR, LF, SP, and HTAB (ASCII values 1 through 8, 11, 12, 14 through 31, and 127) were allowed to appear in header field bodies. CR and LF were allowed to appear in messages other than as CRLF.” |
|