PROPOSAL NO.: 2005-05

DATE: December 10, 2004; updated Dec. 15, 2004

NAME: Change of Unicode mapping for the Extended Roman "alif" character


SUMMARY: This paper presents a change for the mapping from MARC 8 to Unicode for the Latin "alif" character in the Extended Latin set. The new mapping is more compatible with the diverse use of the "alif" character and with the typical representation of the character.

KEYWORDS:"alif " character (all formats); Unicode (all formats); MARC 8 (all formats); ISO 5426 (all formats)


12/10/04 - Made available to the MARC 21 community for discussion.

01/15/05 - Results of the MARC Advisory Committee discussion - Approved

03/02/05 - Results of LC/LAC/BL review - Approved

Proposal No. 2005-05: Change of Unicode Mapping for the Latin "alif" Character


The character labeled "alif" in the MARC 21 Extended Latin character set (ANSI/NISO Z39.47, ANSEL) is a spacing character used in the ALA LC Romanization Tables as the romanized representation of the Arabic letter alif, the Hebrew letter alef, and for the alif in the alphabets of other languages written in Arabic script, such as Moplah, Ottoman Turkish, Persian, Pushto, Urdu, and Malay (Jawi). It is also used for the romanization of other unrelated languages such as Amharic (an Ethiopian language), Burmese, Khmer, and Korean. It may also be used in some Latin alphabet languages such as Indonesian and Turkish.

In published sources (MARC 21 Specifications, the ANSEL standard, the ALA LC Romanization Tables), the "alif" usually appears as an apostrophe like character, with a circular head and a descending tail pointing to the left. The (romanized) alif is also a character in International Standard ISO 5426, Extension of the Latin Alphabet Coded Character Set, where its image is also apostrophe-like.

The mappings for MARC 21 Extended Latin set to Unicode and for ISO 5426 to Unicode currently specify different Unicode equivalents for the Latin alphabet "alif" character.

The MARC 21 mapping is: U+02BE (MODIFIER LETTER RIGHT HALF RING), which looks like a small superscript arc open to the left.


While the preferred mapping today for the Extended Latin "alif" would appear to be U+02BC, the mapping to U+02BE was established for MARC 21 in 1996. A change to the mapping needs to be carefully considered.


The Unicode character U+02BE (modifier letter right half ring) is annotated in the Unicode Standard as follows:
"transliteration of Arabic hamza (glottal stop)."

The Unicode character U+02BC (modifier letter apostrophe) is annotated:
"glottal stop"
"ejective clone of Greek smooth breathing mark"
"many languages use this as a letter of their alphabets"

The specific "Arabic hamza" annotation on Unicode U+02BE was the reason for the original MARC mapping to that character encoding. However, the MARC 21 character, while labeled "alif", has diverse uses, serving several alphabets and purposes as does the ISO 5426 "alif". For that reason the Unicode Technical Committee recommended and the committee for ISO 5426 decided on the use of Unicode U+02BC rather than the more narrowly defined U+02BE for the mapping of the ISO Latin "alif" character.

A companion character to the Latin "alif" is the Latin "yn". It was originally mapped to Unicode U+02BF (MODIFIER LETTER LEFT HALF RING), but in 1999 the mapping was changed to the current one, Unicode U+02BB (MODIFIER LETTER TURNED COMMA). The change was made based on the use of the Latin "ayn" for multiple scripts and purposes and better similarity of the shape for those uses.

Thus the technical advantages for the change in mapping are:
1) Its compatibility with the ISO 5426 mapping, avoiding special handlings on record import and export.
2) It avoids alternative representations of the same character when data from the ANSEL and from the ISO sets are mixed, simplifying applications such as indexing, searching, and sorting.
3) It is the preferred encoding of the character when it is used with multiple scripts.

There are aesthetic advantages as well:
1) Different mappings would lead to different graphic presentations of the character depending on the data source.
2) The Right and Left Half Rings are a glyphic pair as are U+02BC and U+02BB, yet the current mapping takes one from one set and the other from the other set.
3) The U+02BC more closely resembles the glyphs for the MARC 21 Extended Latin "alif" as it appears in many sources.


Change the mapping of the MARC 21 Extended Latin character "alif" (M+AE) from U+02BE to U+02BC.

Go to:
Library of Congress Library of Congress
Library of Congress Help Desk ( 03/02/2005 )