Meeting Summary: Users and Uses of Bibliographic Data
Past Meeting Resources for the Working Group on the Future of Bibliographic Control
March 8, 2007
Google, Inc. | 1500 Plymouth Street | Mountain View, California 94043
written by Nancy J. Fallgren, College of Information Studies, University of Maryland
Consultant to the Working Group on the Future of Bibliographic Control
The purpose of this first public meeting of the Working Group is to gain insight into the requirements of the bibliographic record and bibliographic control in the context of users and usage. The following brief summary of the Users and Uses Meeting highlights particularly relevant portions of some presentations and recurring themes, as well as some more specific requests for change. Fuller documentation will be prepared as part of the Working Group’s final report.
Based on the meeting presentations and comments, two main information user and use environments for bibliographic data are apparent: a consumer environment and a management environment. The consumer environment relates to the end-user of the bibliographic data, the information consumer, as described by Karen Markey and Timothy Burke, and services that are designed to assist the end-user in finding relevant information, from search engines to specialized catalog interfaces. The management environment pertains to resource collection management. Although these two environments represent different perspectives of bibliographic data, they are interrelated, for example, in that data recorded primarily for one environment may also be of use to the other. The creation of authoritative bibliographic data still is necessary to support both environments; however, current bibliographic data do not fully meet the needs of either environment.
The consumer environment is comprised of end-users searching for information resources. These consumers require bibliographic data to assist them in finding, identifying, selecting and obtaining information resources, in English as well as other languages. Both Markey and Burke reveal that users need additional/ richer data and that the bibliographic catalog is merely one of many sources they use to find information. In addition to multiple sources of information, there are multiple access tools for information discovery. End-users employ a variety of general purpose and custom tools to find relevant information. The tools range from general search engines that use keyword as the access methodology to more specialized systems customized for the library environment, such as faceted browser interfaces. In addition, discussions about and designs for bibliographic data must not be couched only in terms of effectiveness in English-language searches and structures.
Karen Markey’s report to the Working Group identifies three main factors that affect consumer use of bibliographic data: system knowledge, domain expertise, and procedural knowledge. In a chart of these factors, she identifies the large majority of users (77%) as having low system knowledge and low domain expertise/procedural knowledge (“double novices”), while at the other end of the scale, only 0.5% of users have high system knowledge and high domain expertise/procedural knowledge (“double experts”). Markey advises that retrieval systems and enhancements to bibliographic control need to focus on helping double novices. Specifically, Markey suggests that information about discipline, appropriate knowledge level, authority of the author, genre/literary nature, how the content is accessible, and other user reviews and ratings, should be added to the bibliographic record.
In identifying the different research approaches he takes for different information needs, Timothy Burke’s presentation underscores the reality that one person can embody different levels of expert and novice knowledge at any given time. Burke’s various research personae each have a different purpose in seeking information and, thus, different information seeking behaviors, based primarily on his domain expertise. Using Markey’s terminology, a university professor might be a double expert in searching for information within his narrow academic specialization, but a double novice in helping a student with a research project outside that specialization. Therefore, Burke identifies seven “tools” that would assist him (in all his personae) in performing research, some of which mirror Markey’s enhancements:
- the ability to recognize clusters of knowledge production (persons and subjects),
- the lineage of publications (i.e., how they exist in chronological relationship to each other),
- the ability to make previously unknown connections among resources,
- the ability to make serendipitous or unforeseen connections among topics,
- identification of the authoritativeness of sources,
- the popularity/amount of use of a resource, and
- the sociology of knowledge, for example the "pedigree" of authors and publishers.
It is clear that there will be human and programmatic consumers of bibliographic data. Tony Hammond discussed various ways in which data were being made available as 'microformats', where particular semantics could be encoded in Web pages for consumption by browser extensions or other user agents.
Bibliographic data generally are not directed in raw form to the consumer, but are manipulated computationally in a variety of ways, some more sophisticated than others, and reformatted to facilitate/enhance the consumer experience. In general, bibliographic control can support these functions with more/better authority control for consistent data, the ability to evaluate and distinguish among resources, and schemes or formats that enable better interoperability among disparate collections.
Enhancing or facilitating the consumer experience can include making the bibliographic data work harder for the user or creating relationships between the bibliographic data and other systems. North Carolina State University’s faceted browser interface, as described by Andrew Pace, is an example of making bibliographic data work harder by decomposing it into facets to guide the user’s search. The usefulness of faceted browsing depends on the ability of the bibliographic data to support it, e.g., on the consistent application of controlled vocabularies. Pace’s specific bibliographic data “wish list” is for
- a classification scheme or subject thesaurus that enables faceted classification,
- a work identifier for books and serials,
- better name authority for organizations, and
- physical description that includes height, width and weight of an item (to assist remote storage management).
Google Book Search, Google Scholar, and the University of California all propose using bibliographic data from other sources to enhance and/or simplify the consumer experience. Google Book Search includes a link to OCLC’s WorldCat to find a specific book in a local library or to access more bibliographic data about that book, as well as a link to Amazon to buy the book. Similarly, Google Scholar can link books and journal articles to individual library collections where a user can check out resources or access full text online. These initiatives help centralize and simplify the search process under one gateway interface. Similarly drawing upon other sources of data, the University of California Libraries’ Bibliographic Services Task Force proposes creating minimal bibliographic records and enhancing them with metadata from other sources, such as content from publisher’s data. In all three strategies, data must be transferable from one platform to another and unique resource identifiers are vital to correctly linking the same resources among different platforms.
One of the challenges before the library community is handling the vast amount of content that is being generated. The traditional approach to generation of bibliographic information through trusted authorities starts to break down as the amount of content grows, while at the same time, a large variety of alternative approaches are being developed to describe, characterize, and classify this information. As a result, a balance needs to occur between more authoritative assertions and assertions that might be made through a diverse range of sources, including users. There are gradations of authority in user-generated metadata, which imply a need for informed usage as it may affect relevance judgments.
Several speakers discussed user-generated metadata that occurs on the Web through tagging, folksonomies, and other such mechanisms. For example, Burke commented that he would like to be able to identify clusters of inter-textuality, but that neither Library of Congress Subject Headings (LCSH) nor folksonomies do this very well. Rather, he suggested the possibility of seeing what scholars have tagged each other; however, Burke also cautioned that this might lead to social hacking, which would taint its usefulness. These sentiments were echoed by Bernie Hurley, who stated that The University of California Libraries’ Bibliographic Services Task Force recommends eliminating LCSH in favor of keywords and social tagging by faculty and subject selectors, experts in their fields. Tony Hammond also showed some applications of tagging in a scholarly environment. Again, this lacks the trusted authority of bibliographic control, but could prove more meaningful to users.
As stated previously, the management environment is concerned with managing resource collections. This environment seeks efficiency in data production, data structures, and cost through inter-operability and the unified management of both multiple resource platforms and multiple institutions.
Bernie Hurley’s presentation about MARC highlighted some of the inefficiencies of the MARC record, including redundancy, the cost of maintenance, and the complexity of indexing. He also noted that the University of California Libraries are creating separate parallel structures of metadata that are more efficient for other objects, rather than forcing these objects into a pre-existing inappropriate format. This underscores one view that MARC is not necessarily too complex, but that perhaps there are too many expectations being placed on it. One way of handling this issue might be, as Hurley described, to create a minimal MARC record enhanced with metadata from other sources.
The management model identified by Oren Beit-Arie suggests a changing environment at the workflow level and the inventory level, both driven by a need for unified resource management systems that provide interoperability among libraries and collections. At the workflow level, more libraries are combining/uniting their resources in both formal and informal consortia. For these groups, bibliographic data are viewed from two perspectives: the data that are common for all members at the collaborative level and the data that are unique to an individual member at the local level. This suggests a workflow model where a base bibliographic record can be entered at the consortia level and enhanced as needed at the local level.
Traditionally, libraries manage inventory/bibliographic resources at the level of bundles, whether an anthology of works, a journal, or a single work. This has become insufficient to meet user expectations of more granularity in bibliographic description and to handle an inventory that is increasingly comprised of electronic formats that are more fluid and accessible at a more granular level. Providing unified management of these disparate collections is the challenge and interoperability of bibliographic data is important to success.
Beit-Arie’s comments specific to bibliographic control mirror those of other speakers, e.g., the need for version distinction and the value of controlled vocabularies. He also noted some bibliographic data that are problematic or not handled well in bibliographic control, including
- encoded data in the 006, 007, 008 and leader fields,
- uniform titles,
- multi-language resources, and
- multiple unique identifiers.
Beit-Arie also noted that bibliographic data should be formatted so as not to hinder the ability of system vendors to manipulate data computationally for user displays.