DATE: December 10, 2004

NAME: Subject Access to Images

SOURCE: Art Libraries Society of North America (ARLIS/NA) and Visual Resources Association ( Elizabeth O’Keefe, Pierpont Morgan Library, with input from Sherman Clarke, New York University)


KEYWORDS: Subject Added Entries (BD), Field 6XX (BD), Visual Materials (BD)

RELATED: This paper discusses the possibility of changing MARC coding in order to distinguish between indexing terms for intellectual content and indexing terms for visual depictions.


12/10/04 - Made available to the MARC 21 community for discussion.

01/15/05 - Results of the MARC Advisory Committee discussion - There was not much support for the creation of a new field block. A straw poll was taken of the group. Many people supported the use of relator codes to distinguish between indexing terms for intellectual content and indexing terms for visual depictions. If relator codes are used, subfield $e is needed in fields 630 and 651 and subfield $4 is needed in fields 630, 650, and 651. It was decided that the use of relator codes should be studied more and presented in another paper. A list of functional requirements should be discussed on the MARC Forum Listserv.

Discussion Paper No. 2005-01: Subject Access to Images


Subject access is an important means of retrieving images. Data dictionaries for image databases (e.g. Categories for the Description of Works of Art, VRA Core 3.0) and guidelines for image cataloging (e.g. Cataloguing Cultural Objects) routinely include subject among the primary data elements. One writer on the topic notes that "after the element of creator/artist/maker, that of content or subject matter appears to be the most widely used in online queries for art-historical material" (1), while another concludes that subject was a determinant of relevance of art images in approximately 35 percent of art history research.(2) For scholars in other disciplines, such as history or literature, or those seeking images for illustration or commerce, subject is probably the most popular retrieval method for images.

Until recently, most image databases, even the relatively small number that used MARC, were separate from library databases, which document collections that are preponderantly textual. But there is a growing trend towards integrated access to the two types of material. This may take the form of integration within a single database of records for textual material and images. Library databases have always contained some records for image-based material, but in recent years the number of records for images has increased, as slides, prints, photographs, drawings, objects, and digital images are routinely incorporated into the cataloging workflow. Alternatively, image and text databases may remain physically separate, but be mapped to a common data dictionary and searched using a single interface. In either case, it is time to consider whether researchers are best served by MARC coding that makes no distinction between indexing terms for intellectual content and indexing terms for visual depictions.


"Subject" within the context of image cataloging means something different from “subject” within the context of library cataloging, which focuses primarily on verbal communication:

"The subject matter of a work of art (sometimes referred to as its content) is the narrative, iconic, or non-objective meaning conveyed by an abstract or a figurative composition. It is what is depicted in and by a work of art. It also covers the function of an object or architecture that otherwise has no narrative content." Categories for the Description of Works of Art (3)

"Subject. Terms or phrases that describe, identify, or interpret the Work or Image and what it depicts or expresses. These may include proper names (e.g., people or events), geographic designations (places), generic terms describing the material world, or topics (e.g., iconography, concepts, themes, or issues)." VRA Core 3.0 (4)

"A subject is that which is depicted, whether it represents all or part of an unbuilt design or built work." A Guide to the Description of Architectural Drawings (5)

"Subject. That which is pictured in, or represented by, the object (e.g., landscape, battle, woman holding child)." Object ID (6)

"The Subject element contains an identification, description, and/or interpretation of what is depicted in and by a work or image." Cataloguing Cultural Objects (7)

Definitions of subject for images focus on “of-ness”. By contrast, discussions of subject analysis within the context of library cataloging focus on “about-ness”—the intellectual content of the material analyzed:

"The "has as subject" relationship indicates that any of the entities in the model, including work itself, may be the subject of a work. Stated in slightly different terms, the relationship indicates that a work may be about a concept, an object, an event, or place; it may be about a person or corporate body; it may be about an expression, a manifestation, or an item; it may be about another work. The logical connection between a work and a related subject entity serves as the basis both for identifying the subject of an individual work and for ensuring that all works relevant to a given subject are linked to that subject." Functional Requirements for Bibliographic Records (8)

"Subject analysis is the part of cataloging that deals with determining what the intellectual content of an item is "about", translating that "aboutness" into the conceptual framework of the classification or subject heading system being used …" Wynar, Introduction to Cataloging and Classification (9)

The same distinction is made in ordinary speech: we refer to an image "of" Lincoln, "of" a horse, or "of" the sinking of the Titanic, but to a monograph, article, or novel "about" Lincoln, "about" a horse, or "about" the sinking of the Titanic.” (The "about" relationship is less straightforward for fiction than for non-fiction, but would be closer to "about" than to "of"). A sample question from an image researcher might be: "Do you have any images of people playing musical instruments?" while someone not engaged in image research would be likely to ask: "Do you have any material about musicians in art?"

"About-ness" does appear as a concept in theoretical discussions of subject matter in images: "By their very nature, most pictures are ‘of’ something; that is, they depict an identifiable person, place, or thing ... In addition, pictorial works are sometimes ‘about’ something; that is, there is an underlying intent or theme expressed in addition to the concrete elements depicted." (10)

The Categories for the Description of Works of Art identifies an "about-ness" subcategory, "Interpretation," for subject matter, defining it as "the meaning or theme represented by the subject matter or iconography of a work of art."(11) Other data dictionaries recognize the distinction in theory but do not recommend treating it as a separate element, and most of the visual resources curators who responded to an informal survey on the Visual Resources Association list-serve reported that they make no distinction between of-ness and about-ness when assigning subject terms to images. For the purposes of this paper, all aspects of the subject content of images will be considered together.


Although there are times when a catalog user wishes to find everything related to a person, place, or thing, regardless of format, most searchers wish to retrieve either images OR text, but not both at once. As long as images and textual materials are confined to separate databases, or there is only a small amount of overlap within a database, searchers can be reasonably sure of retrieving only what they want.

As we move towards integrated searching of databases containing multiple formats, how can we enable users to restrict a search to works about, or to images of, a given person, place, or thing? The easy part will be identifying records that come from purely image or purely textual databases (though there are probably a diminishing number of the latter). But there is currently no easy way to restrict a search within a MARC-based mixed database to images only, or text only. Here are a few of the most obvious methods for doing this.

1. Limit by location

In many institutions, the image collections are administratively distinct from and housed separately from the textual collections. This allows systems to offer limits by location that are de facto limits by material type. For example, a catalog may offer users the ability to limit searches to the Slide Collection, or the Photograph Collection. This solution works as long as the holdings of a location consist solely of images, or solely of texts. But it breaks down when the holdings of the location represent a mixture of text-based formats and image-based formats, as is the case with many archival and special collections. It will also fail to identify material containing a substantial amount of illustration within books and manuscripts. Finally, it may not be possible to limit by sublocation in this way when searching several external catalogs simultaneously.

2. Limit by material type

Most library systems offer the ability to limit a search by the material type encoded in the Leader and/or by values in the 006 or 007 fields. Limiting a subject search for "horses" by the material types "Projected Medium," "Two-Dimensional Non-Projectable Graphic," and "Three-Dimensional Artifact or Naturally Occurring Object" (the three most obvious choices for someone seeking images) will exclude books, periodicals, non-musical recordings, etc. about the topic. However, it is not quite accurate to say that this search will find all records for images depicting horses. Many records for mixed collections that contain text and images are coded simply as p, with no additional 00X fields to bring out the presence of image-based material types. These will be excluded. There is also the potential for false hits, even when a record for mixed materials contains 00X fields for image types. A subject search for "horses" limited by the material types associated with images will find a record for a mixed collection that contain textual material about horses but images of dogs.

Subject searches limited by material type will also fail to retrieve images of horses contained in illustrated books, since the presence of illustrations in books is indicated by values in the 008/19-21 field, not by material type codes. This could result in a search failing to retrieve publications that consist principally of images, such as a published volume of art reproductions or a facsimile of an illuminated manuscript or block book (all coded solely as material type a, "Language material"). Manuscripts which are predominantly textual but which contain significant amounts of illustration would also be excluded from the search, since they are as coded as t, "Manuscript language material".

3. Use indexing vocabulary to differentiate

Subject thesauri such as LCSH and Canadian Subject Headings include some subdivisions indicative of pictorial content. Examples include:

Cats$vPictorial works
Cats in art
Lincoln, Abraham,$d1809-1865$vPortraits
Lincoln, Abraham,$d1809-1865$vCaricatures and cartoons

Recently, a new relator code, dpc (Depicted), has been defined for "a person or organization depicted or portrayed in a work, particularly in a work of art." If this definition were expanded to include all categories of entities, and a subfield $e were defined for the 651 and 630 fields, the term could be used to qualify all the different types of entity depicted in images. The use of a single term to denote all types of images might better meet the needs of those searching for images regardless of format.

In other subject vocabularies, terms for pictorial material are stand-alone terms, which the searcher can combine with subject terms in keyword searches. For example, MESH includes publication types such as "Pictorial works" and "Drawings" that can be combined with subject terms:

cholera and pictorial works (pt)

Some cataloging agencies prefer to bring out the presence of pictorial material by using terms drawn from form and genre lists to supplement the subject headings. For example, a record may contain two headings:

650 #0$aHorses.
655 #7$aPhotographs.$2aat

There are several drawbacks to a vocabulary-based approach to retrieving images by subject. When material types are incorporated in precoordinated headings, they can be ambiguous: does a heading such as "Horses in art" or "Horses--Photographs" refer to discussions of artistic representations of horses, or to collections of reproductions of horses in art works? Coding ($x for topical treatment, $v for form) helps catalogers distinguish between the two, but is lost on most catalog users. When the material types are separate terms which are searched in combination with subject terms, they can retrieve false hits (in the example above, do the headings refer to images of horses, or to textual material about horses and photographs of something entirely different?)

Another objection to relying on vocabulary is that it puts the burden of retrieval on our users. Subject browses depend on the searcher’s ability to recognize all the qualifiers that might signal the presence of images in the items cataloged. If the catalog contains a great deal of material on a topic, the subject headings for pictorial content might appear several screens away from the base term; the searcher who finds "Horses" on the first screen may not persevere long often to find "Horses in art" or "Horses--Pictorial works."

Unlike subject browses, keyword searches, preferred by most users, work on both precoordinated and uncoordinated headings. However, they require the searcher to enter all possible qualifiers in order to find all images. It is hard to imagine anyone except an information professional devising a search such as: horses and (pictorial or photographs or art or depicted) OR: lincoln and (cartoons or portraits or photographs or depicted)

In any case, keyword searches will turn up many false hits. For example, a search in the Library of Congress catalog for the subject keywords battle antietam pictorial retrieves a record for the Keidel family papers, which contain letters about the battle, and a scrapbook containing illustrations of birds and ornithological material.

Once the universe widens beyond library databases to image databases, using vocabulary to differentiate between text and image content will be even less effective. Image databases are unlikely ever to adopt indexing terms that would have to be applied to every item in their collections. The Thesaurus for Graphic Materials does not use "Pictorial works" or "In art" because "For collections consisting largely or entirely of pictorial works, both of these techniques lose their meaning "(12), a sentiment that would undoubtedly be shared by other image databases.

4. Defining a new code within MARC to distinguish subject as pictorial depiction from subject as intellectual content

Using MARC encoding to make this distinction has a lot to recommend it. MARC coding is language- and vocabulary-independent. Unlike approaches that rely on limiting by location, material type, or vocabulary, coding at the field level would facilitate the definition of separate indexes and distinctive displays. The ability to create distinctive displays would be particularly helpful for records in which some indexing terms apply only to textual elements and some only to pictorial elements, for example, when a record for a piece or a collection of sheet music contains indexing terms for both the type of music and the images depicted on the cover, or a record for a single manuscript leaf contains indexing terms for the content of the text and for the pictorial content of a marginal illustration.

Unfortunately, the benefits of such an approach are more obvious than the method that could be used to achieve it. The techniques that come immediately to mind are defining a new block of field tags, using an indicator within the existing block of subject fields, or developing a new method entirely.

The current list of fields defined as subject fields within the MARC bibliographic format includes the following:

(653, 655, 656, 657, and 658 are omitted from this list, since they are identified simply as Index terms, rather than Subject terms).

Since each of these fields defines an entity type that might be required for indexing the pictorial content of images, and since authority control of the terms used for indexing images would be desirable, sixteen new fields (eight standard, eight local) would be needed for subject access to visual content. It is probably impossible to find an unused block consisting of X00, X10, X30, etc.., unless the 9XXs were used, and these are already reserved for local information. The only option would be to chose an unused block in the 6XX field, and replicate the order in which the tags are assigned to the various entity types in the current subject fields, e.g.:

As with the fields currently devoted to subject access, individual institutions would be able to determine the granularity and specificity of indexes and displays for subject access to images. Some institutions might wish to use a single index and label to represent all types of image content, some might choose to differentiate, e.g. "Images of persons:" versus "Images of places:".

Another approach would be to use an indicator within the existing field tags to make the distinction. The use of an indicator is appealing, because it would allow the retention of the current field tags, and make for minimal disruption to heading validation. Unfortunately, this could not be an across-the-board solution, since both indicator positions are already in use for these fields, while all numeric values within the second indicator for the 630 field are already defined.

Defining a subfield $7 for these fields is another possibility. As used within the 76X-78X fields, $7 uses codes to give more information, such as material type and bibliographic level, for the related resource cited in the field. Conceivably it could be used in the subject fields to code a subject term as being used for material about or images of. It would be more difficult to make use of this information to generate distinctive displays or indexes.

There does not appear to be any easy solution to this problem within the MARC format, but helping users locate images is something that we are increasingly going to be asked to do, so we ought to begin to consider a solution.

Finally, it should be stressed that if a change is made to MARC to enable us to distinguish between intellectual content and image content, institutions are at liberty not to implement it. The change, if one is made, would be comparable to the introduction of a separate MARC field for genre/form headings. Prior to the definition of the 655 field, the 650 field was used for both material about and examples of various genres. The definition of the new field made it possible to distinguish, but did not obligate users to do so; many catalogers, especially music catalogers, continue to use the 650 field for subject, form, and genre. Even institutions which choose to differentiate in coding may still prefer to lump the two types of subject together for purposes of display and indexing. What the change would do is to empower cataloging agencies which wish to make this distinction because it is important for their users.

Questions for discussion:

1. How important is the distinction between intellectual content and pictorial depiction of?

2. Are there analogies from other formats, e.g. sound recordings, moving pictures? Would a sound recording of bird song be considered a depiction of bird song, rather than about birds? Do subject headings such as "Cats$vSongs and music" reflect the content of the lyrics (and therefore count as being "about")? Do catalogers of films distinguish between about-ness and of-ness? What about the difference between news footage, as opposed to films with conventional story lines?

3. Is there anything to be learned from the application of subject terms to fiction? Should we be thinking in terms of a dichotomy between text and image, or along some other axis?

4. Impact on retrospective cataloging: The definition of a new method for encoding information about pictorial content will obviously require some adjustment to existing records for those who wish to adopt the new field type. Are there advantages/disadvantages to the various techniques described, in terms of how easy it would be to implement global changes?

5. What impact would this have on the authority format?


