DATE: December 14, 2010

NAME: Coding for Original Language in Field 041 (Language Code) of the MARC 21 Bibliographic Format

SOURCE: Online Audiovisual Catalogers (OLAC)

SUMMARY: This paper proposes redefining $h so that it only contains the language code of the original (regardless of whether the resource is a translation), and defining new subfields for language code of intermediate translation and language code of the original language of subsidiary materials.

KEYWORDS: Field 041 (BD); Language code (BD)

12/14/10 – Made available to the MARC community for discussion.

1/8/11 - Results of MARC Advisory Committee discussion: Approved as amended. The revised definition of subfield $h will specify that the language code is for the original language of the primary content of the item and that it is not required to use it if the item is not a translation. A new subfield $n will be added for original language of libretto and libretto will be removed from the description of subfield $m.

02/17/11 - Results of LC/LAC/BL review - Agreed with the MARBI decision.

Online Audiovisual Catalogers, Inc. (OLAC) would like a place in the MARC record where catalogers can optionally record the original language of the primary moving image work(s) on a bibliographic record in an explicit, unambiguous, machine-actionable way. Elaboration on language-related information currently appears in coded form in bibliographic records in field 041 (Language code). However, the existing structure of 041 does not support reliable, accurate retrieval of original language.

At the MARBI meeting in June 2010 OLAC presented a discussion paper (Discussion Paper No. 2010-DP05: Language Coding in Field 041) attempting to address this problem. It was suggested that a new subfield be added to field 041 where the original language of the main work could prospectively be coded explicitly, rather than included in subfield $h, which also includes a language of intermediate translation. A couple of other coding options were presented in the discussion paper. However, some members of the MARC Advisory Committee were unsure of the necessity for this change and thought that it might have undesirable consequences for other formats or complex coding situations. This proposal attempts to present another approach and to provide more examples demonstrating potential impact.

Field 041 is currently defined as follows:


The original language of resources may currently be recorded in bibliographic records in 041 $h (Language code of original and/or intermediate translations of text) when a translation is involved. The subfield was originally intended to be used for translations, i.e. first indicator is value 1, unless it is a translation of a libretto or other accompanying material and follows subfield $e or $g.

An additional complication is that 041$h is simultaneously used to record other types of language information. For example, it is used for languages of intermediate translations. There is no way to retrospectively identify when 041$h has been used for this purpose.

Subfield 041$h is used to represent original languages of things other than main works. Most commonly, it is used by the music community to record the original language of librettos (041$e) or accompanying material (041$g) such as liner notes. Each 041$h is supposed to follow the subfield to which it is referring to so in theory these could be comprehended by a computer. However, order is often unreliable and programming a computer to parse information this way is more complicated. Defining a single subfield that could be put in a single index would be a more practical way to produce machine-actionable data.

Subfield $h is currently defined as follows:

Language code of original and/or intermediate translations of text
Language code(s) for intermediate translations; codes precede those for original languages.
For music, when printed or manuscript music, sound recordings, or the accompanying material for these items is or includes a translation, subfield $h may follow the related subfield $a, $d, $e, or $g. Note that the first indicator position may contain value 0 indicating that the main item is not a translation when the language coding of the original refers to language of librettos (in subfield $e) or language of table of contents (in subfield $g).

OLAC would therefore like to suggest a different approach that would provide the functionality we are looking for with, we hope, less disruption to other cataloging communities.

  1. Redefine 041 $h to be Language code of  the primary original text/soundtrack  regardless of whether or not a translation is involved. For silent films the original language would be coded based on the original language of the intertitles. The majority of existing 041 $h represent the original language of the primary work(s). Note that it is not required to supply the original language if the resource is not a translation.

  2. Make a new subfield $k for language(s) of intermediate translation.

  3. Make a new subfield $m for original language(s) of subsidiary materials, such as librettos and accompanying materials. Using one subfield would require that subfield $m follow the subfield to which it applies. This approach has flaws, since it cannot be guaranteed that order is retained and may depend on system implementation of language codes as to whether the results will be as expected. Alternatively, separate subfields could be defined for each type of subsidiary material identified.

This approach does not help with retrospective data, but it would enable a clean start going forward. For moving images $h is rarely used for anything except original language except for some music materials, so this would enable systems to be more easily configured to offer the search by original language option that users of moving image materials would like.


Example 1. Citizen Kane DVD with an English soundtrack and English, French, Portuguese and Spanish subtitle tracks.

041 1# $aeng$heng$jeng$jfre$jpor$jspa$geng

Example 2. Recording of an opera sung in French, original opera in Italian, libretto in English, French, German, and Italian, liner notes in English, French, German, Italian, Japanese, and Spanish but liner notes known to be originally written in German.

041 1# $dfre$hita$eeng$efre$eger$eita$mita$geng$gfre$gger$gita$gjpn$gspa $mger

Example 3.  Multilingual text not known to be based on a separate original text. Since the use of $h is optional, it is not necessary to record anything when the original language is not known or would not be considered to be important to users.

Multilingual list of narcotic drugs under international control.
In English, French, Spanish, and Russian.

041 0# $aeng$afre$aspa$arus

Example 4. Text with intermediate language of translation.

Mahāvairocana-sūtra : translated into English from Ta-p'i lu che na ch'eng-fo shen-pien chia-ch'ih ching, the Chinese version…
“The Sanskrit text of the Mahāvairocana Tantra is lost, but it survives in Chinese and Tibetan translations… There are translations from both into English.” (

041 1# $aeng$kchi$hsan

Example 5. Explicitly coding original language for a book that does not involve a translation.

041 0# $aspa$hspa


