Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

Short Message Service (SMS) Message Format

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Short Message Service (SMS) Message Format

Short Message Service or SMS messages, as defined in RFC 5724, are short two-way alphanumeric paging messages that can be sent to and from SMS clients. SMS clients, including text messaging service component of phone, Web, or other mobile communication systems, are an integral part of the GSM (Global System for Mobile Communications) network technology. SMS uses standardized communications protocols to allow fixed line or mobile phone devices to exchange short text messages. SMS messages can be used to transport almost any kind of data (within the character limit).

One of the defining characteristics of SMS messages is the maximum length of 160 7-bit characters (140 octets). Although early drafts of the specification did not specify any standardized methods for concatenating SMS messages, SMS messages can be concatenated to form longer messages by following the concatenation method based on the header in the TP-User Data field as specified in 3GPP TS 23.040 although compliance to this protocol is not required. It is up to the user agent to decide whether to limit the length of the message, and how to indicate this limit in its user interface if necessary.

Character set also impacts the character limit. 7-bit characters from the 3GPP 23.038 GSM character set are the default although other character sets may be supported by specific applications. The use of another character set may impact the character limit, e.g., UCS-2 16-bit characters results in 70-character messages. If other character sets are used, applications handling SMS messages are required to map the character sets to and from the character set used for SMS messages. Implementations may choose to discard (or convert) characters in the message body that are not supported by the SMS character set they are using to send the SMS message. If they do discard or convert characters, applications must notify the user.

The restricted character limits has increased the development and adoption of Internet slang which includes the use of abbreviations, acronyms, keyboard signals, emojis and shortened URLs to save keystrokes and to compensate for small character limits.

Local use Explanation of format description terms

LC experience or existing holdings None
LC preference The Library of Congress has not yet specified a format preference for instant messages.  

Sustainability factors Explanation of format description terms

Disclosure Fully documented.
    Documentation SMS format is documented in RFC 5724 which is in the IETF Standards Track. Documents from 3GPP define other aspects of the transmission process. See Format Specifications for details.
Adoption Highly adopted. According to Wikipedia, "SMS was the most widely used data application, with an estimated 3.5 billion active users, or about 80% of all mobile phone subscribers at the end of 2010." The article notes that SMS is being challenged by "alternative messaging services such as Facebook Messenger, WhatsApp and Viber available on smart phones with data connections, especially in Western countries where these services are growing in popularity." Twitter's 140-character limit for tweets was designed to comply with the SMS format. In addition to cellular phones, most satellite phones support SMS.
    Licensing and patents None
Transparency Apart from when encrypted, SMS files are usually simple text files and can be opened in Notepad or a web browser.
Self-documentation Metadata is available through the well-structured header fields.
External dependencies None
Technical protection considerations SMS messages sent on the Global Service for Mobile communications (GSM) network have optional limited encryption only the airway traffic between the Mobile Station (MS) and the Base Transceiver Station (BTS) although this is through an optionally encrypted "weak and broken A5/1 or A5/2 stream cipher. The authentication is unilateral and also vulnerable." See Notes for more about security issues.

Quality and functionality factors Explanation of format description terms

File type signifiers and format identifiers Explanation of format description terms

Notes Explanation of format description terms


SMS messages are transmitted through a formal "sms" URI syntax defined in RFC 5724. This includes declarations for a sms-recipient, telephone-subscriber (or telephone number), and message body.

According to Wikipedia, SMS messages are vulnerable to security issues partially "due to its store-and-forward feature, and the problem of fake SMS that can be conducted via the Internet. When a user is roaming, SMS content passes through different networks, perhaps including the Internet, and is exposed to various vulnerabilities and attacks." One of these is SMS Spoofing in which an account roaming on a foreign network is "hijacked" to send messages into other networks.


Format specifications Explanation of format description terms

Useful references


Last Updated: 07/20/2023