The Library Catalogue in a Networked Environment

Tom Delsey
Director General
Corporate Policy and Communications
National Library of Canada
395 Wellington St.
Ottawa, Canada K1A, ON4

Final version

With the migration of the library catalogue to a networked environment there have been a number of significant technological changes in the way cataloguing data is accessed and utilized. As the OPAC has been supplemented by other technologies-search and retrieval protocols, browsers, search engines, and resolution services-the interfaces between the catalogue and the user, between the catalogue and the library collection, and between the catalogue and other sources of data on the network have become increasingly complex, both in the way they are structured and in the level of functionality and interoperability that they support. To understand more fully the way the catalogue functions in a networked environment, and how its functionality can be optimized, it is important to view the catalogue not simply as a data store, but more broadly as the interaction between that data store and a growing range of networked applications that interface with the catalogue.

This paper is intended to do just two things. The first is to sketch out in broad terms the impact that technological change over the past few decades has had on a number of key interfaces to the library catalogue. The second is to highlight, again in fairly broad terms, certain aspects of those interfaces that will need to be analyzed more closely as we endeavour to make the library catalogue a more effective tool for accessing networked resources. My purpose is simply to help establish a frame of reference or context for some of the more specific needs, challenges, and potential solutions that will be addressed in greater detail in the dozen or so papers that follow.

The Impact of Technology on the Interfaces

There are two interfaces that are absolutely integral to the functioning of the library catalogue: the interface with the user and the interface with the resources described in the catalogue. It is through those interfaces that the catalogue fulfills its primary function of facilitating access to the library's collection. There are, however, two other key interfaces that come into play as a means of supplementing the functionality of the catalogue. One is the interface between the catalogue and the tools produced by abstracting and indexing services. The other is the interface between the local catalogue and the union catalogue. The A&I interface serves to supplement the level of content analysis that is provided by the catalogue itself. The union catalogue interface has served to supplement the reach of the catalogue, facilitating access to the library's collection for a wider group of users than the library's direct clientele.

In the transition of the library catalogue from its card format to the OPAC, and the subsequent migration of the OPAC to the Internet and the Web, there have been significant impacts on all four of the interfaces to the catalogue. A brief overview of the changes that have occurred with respect to each of the interfaces will serve to highlight how significant some of those changes have been and what kind of challenges we face in adapting the interfaces to a new technological environment.

The User Interface

The most obvious impact of online technology on the user interface with the library catalogue has been the extension of access. A machine-readable database of catalogue records, by effectively eliminating the physical constraints associated with the card catalogue, brings with it the potential to give the user access to virtually any element of data within the catalogue. With online access to the catalogue, the traditional access points provided in the card catalogue have been supplemented through the indexing of a variety of additional data fields, extending the scope of the user's searching capability significantly. Computer indexing has also served to extend the functionality of the individual access point. Permutation of conventionally structured headings makes it possible to search the catalogue not only using the lead element in such headings, but using any sub-element of the heading, whether it be the name of a corporate body recorded as a sub-heading, or a form subdivision used in a subject heading. Keyword indexing has extended the search capability even further. And the addition of Boolean search functions has given the user the capability of extending or narrowing a search in ways that simply were not possible with the fixed structure of the card catalogue.

Online technology has also had a significant impact on the way catalogue data is displayed. The conventional "unit record" display of the card catalogue has been displaced by what is typically a graduated display starting with one or more "results set" screens, from which the user is given several options for the display of individual records, ranging from some form of brief record, through a full record in a conventional catalogue entry format, to a display of the record with all its MARC coding. In addition, the online catalogue offers the user a range of options for sub-arranging the records that form the results set for the search, as well as the capability of combining results sets.

These new capabilities for both search and display of catalogue data have had the effect of substantially altering the underlying structure of the library catalogue. The structure of the card catalogue was effectively pre-determined by the form in which headings and references were cast, by the format of the "unit record," and by the conventions used for filing individual entries within an established sequence. The standardization of cataloguing rules, card formats, and filing rules served to establish a uniform structure for the card catalogue that was in all essential respects consistent from one library to another. With the introduction of the online catalogue all that was changed. The opportunities that the technology provided for extending access to the data stored in the catalogue and expanding the range of display options led to innovations in the design of the user interface that have not only altered the nature of the catalogue as a search and retrieval tool, but have effectively displaced what had been a common structure with a multiplicity of structures.

From the user's perspective, this migration of the library catalogue to an online environment meant that the interface with the catalogue had to be re-learned. What had been a relatively simple tool, the structure of which could be understood more or less intuitively, and the use of which required little technical skill, had been displaced by a tool that was considerably more complex in its design and utilized a new technology that in itself required the user to develop a new skill set.

The second stage in the migration of the catalogue-from what was effectively a "local" online environment to a fully networked environment-has brought with it a new set of challenges. The innovation that was sparked with the introduction of online technology, and the wide-ranging variations in the design of database structures, indexing methods, and systems functionality that have ensued, have made the design and implementation of user interfaces in a networked environment all the more complex. In the "local" online environment, the user interface was designed to function within the context of a particular database structure, a specific set of indexed data elements, a defined set of processing capabilities, and an established range of functionality at the desktop. In a networked environment the potential for variability within those interface dependencies is virtually infinite.

The Resource Interface

The migration of the catalogue to an online environment has until recently had a relatively minor impact on the interface between the catalogue and the resources it describes. The reason, of course, is that prior to the more recent development of Internet and Web technologies, the interface between the catalogue and the collection of resources it described has had to bridge what are effectively two separate environments. As long as the library collection itself remained exclusively a physical collection stored on shelves and in cabinets on the library premises, any direct connection between the catalogue and the collection was impossible. As a result, the resource interface continued to function in the same way as it had prior to the computerization of the catalogue. That is to say that the interface continued to be dependent on a locally assigned data element in the form of a call number or shelf number appended to the catalogue record that served to identify the location of the item described within the collection as a whole, but otherwise provided no direct link to the resource.

More recently, as libraries have begun to add networked resources to their collections, it has become feasible to create a direct link between the catalogue record and the resource described. To this point, most of those links have been established by means of a Uniform Resource Locator (URL) that functions, through Internet protocols, as an accessible address for the resource. The link is effective, of course, only as long as the address remains valid. And therein lies the first challenge.

To be effective in supporting the link from the catalogue to the resource described, the identifier on which that link is based must remain valid over time. As library collections become increasingly "virtualized," maintaining the validity of the identifier becomes increasingly problematic. For resources that are stored on a server under the direct control of the library, the continuing validity of the identifier can be achieved through effective management of the library's own repertoire of URLs. But for resources that are stored on servers outside the library's control, the continuing validity of the link is entirely dependent on the data management practices of the host organization. And that is equally true for any identifier that labels itself as a "persistent" identifier (such as a PURL or a DOI) as it is for a simple URL. If the host organization fails to maintain the link between the resource and any identifiers that have been used to support the link to that resource over time, those identifiers simply will not work, regardless of whether they purport to be persistent or not.

A second challenge to the resource interface arises from changes in the nature of ownership in the collection that are the result of extending the library collection to encompass networked resources. Traditionally libraries have served their users by making items in their collections available for onsite use, for loan, or under certain circumstances, by making a copy of a portion of an item's content. Such uses have been predicated on the library having physical ownership of the copies in its collection, having the right to lend such copies, and having the right through exceptions in copyright law to make copies in accordance with specific criteria. With the introduction of digital resources, and in particular networked resources, the proprietary relationship has changed, and the library's entitlement to make those resources available for use is increasingly governed by contractual licence. The application of copyright in a digital environment, as reflected in recent judicial decisions and amendments to copyright law, is also significant.

From a technical perspective, the increased complexity associated with access rights to networked resources will have a significant impact on the interface between those resources and the library catalogue. The interface will have to function as more than a simple link from the catalogue record to the resource described. The resource interface may in fact have to be reconceptualized to function in tandem with the authentication procedures in the user interface to support the administration of terms and conditions embodied in contractual licences and perhaps even to monitor uses permitted under copyright law.

The Abstract/Index Interface

The tools produced by abstracting and indexing services have always been used by libraries as a pragmatic means of extending bibliographic access to the contents of their collections. Such tools provide a level of content analysis for journal literature, conference proceedings, compilations, and anthologies that libraries are rarely able to provide through the catalogue itself.

The application of online technology both to the library catalogue and to abstracting and indexing tools has served not only to improve access but also to increase the efficiency of the interfaces between the abstracting and indexing tools and the library catalogue in a number of ways. The most notable impact on the interface has been the effective integration of access to data describing articles, papers, etc. contained in the journals and conference proceedings held by an individual library with access to data recorded at the serial or monographic level in the library's catalogue. Integrated access has been made possible, in large part, through the widespread use of standard identifiers such as ISSNs and ISBNs both in the citations that are created for the abstracting and indexing tools and in the monographic and serial records created for library catalogues. Additional support for integration has come from the systematic enhancement of serial records through initiatives such as the CONSER A&I Project to include structured data fields identifying specific tools in which the contents of the serial described in the catalogue record are indexed. With those kinds of data links in place it has been possible for libraries to extract customized subsets of abstracting and indexing data relevant to their individual collections and to use their local OPAC software to provide access to their holdings of serials and conference proceedings at an analytical level.

As abstracting and indexing databases move to a networked environment, and as the scope of A&I services is extended increasingly to coverage of electronic journals and other networked resources, the relationship between the user, the catalogue, the A&I database, and the electronic resources that both the catalogue and the A&I databases provide access to has the potential to be substantially altered. Standard search and retrieval protocols open up the possibility of providing another alternative to the OPAC as a means of accessing analytical data derived from multiple A&I sources through a single search. In addition, where the journal or other source referenced in a citation is in electronic form, accessible through the Internet, networking technology makes it possible for the creators of A&I databases to link their citation data directly to the electronic article or document cited. Technically speaking, routing the output from an A&I search through the library catalogue in order to provide the user with a copy of the article or document cited is no longer a pre-requisite.

What remains, however, is a need, at least in certain cases, for the library to serve as an intermediary in validating the user's access rights to the electronic resource. If access to the resource is restricted to licensed subscribers, and the user is accessing the resource as a user of a particular library, it will be necessary to verify that the user is entitled to access under the library's licence. Making that connection between the user and the library thus becomes a legal pre-requisite, and introduces added complexity to the relationship between the user, the A&I database, and the electronic resource. In effect, it becomes necessary in a networked environment to re-establish an interface between the A&I database and the library catalogue that will support user access to electronic resources in the library's collection that is not entirely dissimilar in function to the interface between A&I data and catalogue data that has been established to operate at the local level through OPAC software.

The Union Catalogue Interface

The union catalogue has traditionally functioned as a means both of extending the reach of the local catalogue and of supplementing its scope. Holdings reported to union catalogues have served to make the reporting library's collection accessible to a wider group of users than would normally be served by the local catalogue. In turn, having access to union catalogues has served as a means of meeting user needs that cannot be fulfilled through the local catalogue.

With the application of online technology to both the local catalogue and the union catalogue, the interface between the two began to change in a number of technical respects, but its basic nature remained largely unaltered. Holdings that had previously been reported in the form of cards, printed lists, or microform began to be reported in machine-readable form, first on tape and subsequently through file transfer protocols. Editing and de-duplication processes were automated to the extent possible, but continued to be supplemented through manual follow-up procedures.

Initially the introduction of online technology in fact had less impact on the interface between the local catalogue and the union catalogue than it had on the user interface to the union catalogue itself. The new search capabilities that were available through online technology served to make the union catalogue, for the first time, and in most respects, as effective an access tool as the local catalogue. Prior to computerization, the union catalogue had functioned in a much more limited way than the local catalogue, largely because the physical constraints of the card catalogue and the labour required to compile and edit the catalogue made its implementation as anything other than a single entry catalogue impractical. But once the card catalogue was replaced with a machine-readable database it became possible to exploit the power of online technology as fully with the union catalogue as with the local catalogue.

With the introduction of the Internet, however, there has emerged an alternative means of extending the reach and supplementing the scope of the local catalogue. With networked support for search and retrieval protocols such as Z39.50, the union catalogue has been reconceived as the virtual union catalogue. The potential advantages to be gained through implementation of a virtual union catalogue are considerable-elimination of the costs associated with compiling and maintaining a separate union catalogue database, more flexibility in establishing the scope of libraries to be included in a union catalogue search, more timely "reporting" of new accessions and withdrawals, and a seamless interface to data on the current availability of an item targetted for loan. What remains to be seen, though, is whether implementation of the supporting protocols can be managed in such a way as to realize those potential benefits across a critical mass of library systems. The other key challenge for the virtual union catalogue is to find a means of achieving "on the fly" what has been achieved in the conventional union catalogue through systematic quality control and de-duplication procedures.

Areas of Focus for Future Development

Addressing the challenges posed by the migration of the catalogue to a networked environment is going to require the involvement of the library community in a multiplicity of assessment and development initiatives. The range of issues raised as a result of technological change is broad and complex. Each of the interfaces to the catalogue is affected in different ways, and new interdependencies have emerged between and among the interfaces.

Stepping back and looking at the impacts in the aggregate, there would appear to be three broad areas in which future development needs to be focused. The first centres on the data itself. If the catalogue is to function as an effective tool for facilitating access networked resources, we need to ensure that the data recorded in the catalogue is adaptable to the description of those resources and that it is adequate to support the various applications that will draw on it. The second relates to the functionality supported by the interfaces. Again, if the interfaces are to support a wider range of functions and to operate within in a more complex architecture, we will need to ensure that the requirements and interdependencies are fully understood. Thirdly there is the issue of strategic positioning of the catalogue. This new environment requires extensive rethinking not just of how the technology can be exploited, but also of how the catalogue, and by extension the library itself, can be repositioned to meet the needs of its users.

Reassessing Data Requirements and Conventions

In comparison with the scope of technological change that has occurred with the migration of the library catalogue to a networked environment, there has been relatively little change to date in the bibliographic conventions used by libraries to compile data for those catalogues. Cataloguing rules have been updated in an incremental way over the past three decades to accommodate the description of an evolving repertoire of information carriers, and MARC formats have been enhanced to some extent to respond to current technical developments in data management, but the rules and formats remain strongly rooted in earlier technologies, and there is a growing gap between the conventions reflected in cataloguing rules and formats and the technological environment within which the catalogue currently operates.

As the nature of the resources available through the Internet and the World Wide Web evolves, and as the user's approach to resource discovery changes in response to features built into browsers, search engines, and other tools available on the network, it is essential for libraries to take a closer look at the data used in resource discovery and the way in which it is used. That process might usefully start with a review of the matrices developed for the Functional Requirements for Bibliographic Records that mapped attributes and relationships associated with the various entities reflected in catalogue records to the generic user tasks-find, identify, select, and obtain.[1] What needs to be determined is whether there are attributes or relationships associated with networked electronic resources (at either the logical or the data element level) that have significant value to the user engaged in resource discovery that are not currently reflected in catalogue records. That review needs to focus not only on data required to assist the user in finding resources in response to a search query, but also on data required to assist the user in assessing the relevance of the resources found and determining the usability of the resource from a technical perspective as well.

At a deeper level there is a need to revisit the cataloguing conventions that are currently used to describe resources in library collections and to determine standard access points and citation forms for the works contained in those resources. The analysis of the Anglo-American Cataloguing Rules that was undertaken recently for the Joint Steering Committee revealed a number of structural issues relating to the internal logic of the cataloguing code that need to be addressed if AACR is to serve as an effective tool for cataloguing digital resources.[2] Embedded in the logic of the code there are implicit assumptions derived from the traditional view of the resource as a physical object that make the application of the rules to networked resources highly problematic. A key issue to be addressed is how to adapt cataloguing data conventions to accommodate the description of resources whose content is not fixed in the way it was in non-digital media and is so susceptible to transparent alteration and extension.

On another front, data requirements for support of the interface between the library catalogue and the resources described in the catalogue need to be reassessed in the context of the direct linking to networked resources that is now possible. Libraries need to evaluate the relative strengths of the various identifiers that might be used to support the link from the catalogue record to the resource and determine how to achieve the persistency that is required of that link. Over and above the link itself there is a need to determine data requirements related to access rights. Although data relating to the "purchase" of a resource has not normally been recorded in the catalogue per se, logically such data, being both library-specific and resource-specific, needs at least to be linked to the data the library maintains in the catalogue to identify the resource, and the resource interface needs to draw on and link both types of data.

By extension, data relating to access rights acquired by the library will come into play as well in the union catalogue interface. With the addition of networked electronic resources to library collections there will be a need to indicate whether access to a particular resource is restricted to the library's direct users, or whether access through an arrangement analogous to interlibrary loan is possible, and if so, under what conditions. In that context there may be a need for additional data relating to access rights acquired, for example, through a consortium licence, that would be relevant to a user conducting a protocol supported search of a virtual union catalogue.

Re-examining the Interfaces

As noted earlier, to understand the way the library catalogue functions in a networked environment the catalogue needs to be viewed not simply as a database but more broadly as the interaction between the database and the applications that interface with it. To understand how the catalogue's functionality can be optimized in a networked environment, it is necessary, therefore, to re-examine not just data requirements but the functional requirements supported by the interfaces as well.

Looking, for example, at the changes that have occurred in the transition from the manual catalogue to the OPAC, and in turn from the OPAC to the networked catalogue, it is clear that the functionality supported by the user interface has changed significantly, with increased search capabilities and greater flexibility of display. However, when comparing the support that a typical client application offers for organizing a display of multiple records for various versions and editions of the same work with the logical sequencing of those same records in a card catalogue, it is not always so clear that the functional support provided by the online interface is an improvement over its predecessor.[3]

A similar observation can be made regarding the union catalogue interface. In a pre-networked environment, the usability of the union catalogue was heavily dependent on the editing and de-duplication processes that were an extension of the "reporting" mechanism, and in effect part of the interface between the local catalogue and the union catalogue. With the implementation of the Z39.50 protocol and the development of the virtual union catalogue, those editing and de-duplication processes have been relocated, as it were, to the client application, where they have to be executed "on the fly" with each results set. Current implementations of Z39.50 client software in fact seldom provide that level of functionality. Add to that the shortcomings of client applications in supporting logically organized displays of results sets, and it should be fairly evident that further development is needed to bring the interface with the virtual union catalogue up to par.[4]

A re-examination of the functionality incorporated into Z39.50 client software might be extended further to include an assessment of the potential for such applications to support a networked interface between the catalogue and abstracting and indexing databases. In a networked environment it is technically feasible to achieve through a Z39.50 or other protocol based interface what in a local OPAC environment could only be achieved by maintaining on a local server copies of records derived from commercially produced abstracting and indexing databases, pre-selected to correspond to the library's serials holdings. If protocol supported client software is to serve that purpose effectively, however, it will be necessary first to establish a broadly based framework for interoperability between client applications at the library end of the interface and target applications at the A&I end. In practical terms, the most effective means of developing such a framework would likely be through an extension of the work that is currently underway with the development of the Bath Profile.[5] Key elements in the interface that would need to be examined are the identifiers, both at the article level and at the serial level, that are now being used in A&I database citations for networked resources.

With the migration of the resource interface to a networked environment functionality issues of another kind emerge. As noted earlier, prior to the introduction of Internet and Web technologies there was in effect no technical means of fully supporting the interface between the catalogue and the collection of resources it described. OPAC technology could be used to generate a call slip (or its equivalent as a screen display), but from there it was left to the user or a library employee to manually retrieve the item from the stacks. Now with the capability of linking directly from the catalogue record to the resource described (at least in the case of networked resources) a new dimension of functionality is brought into play. The resource interface becomes in effect a resolution service, or at least the front end to a resolution service.

Technically, resolution in a networked environment is fairly straightforward. What is more complicated, however, is designing mechanisms that will facilitate resolution that is consistent with proprietary and contractual arrangements associated with a particular resource. It cannot be assumed that resolution from the description of a resource in a library catalogue directly to the originator of the resource will invariably be the preferred route. There will be cases where the library requires a connection to be made indirectly via a supplier or aggregator who manages the library's licence for access to the resource. There will also be cases where a connection to an archived version of the resource housed on a server maintained by the library itself is required. What needs to be examined more closely is whether the mechanisms embedded in the network per se will be sufficient to support the kind of selective routing that a library may require, or whether that kind of functionality needs to be built into the library's end of the resource interface.

Repositioning the Catalogue

Optimizing the performance of the library catalogue in a networked environment will clearly require a significant level of effort in the technical redesign of data structures and applications. Much of that work will have to be carried out at an international level, and will involve a significant degree of cross-sector cooperation. But optimizing performance through the exploitation of network technologies is not all that will be required to position the library catalogue strategically within this new environment.

The technology that supports the direct linking of catalogue records to the electronic resources they describe is also being used to support links to those same resources from a wide range of network browsing services, Web directories, indexing tools, and publishers' databases. The same technology also supports direct links from references and citations embedded in an electronic document to the resources referenced. Likewise, the technology that supports the horizontal extension of the local catalogue through the virtual union catalogue or through a networked interface between the catalogue and an A&I database is being used in other sectors as well to extend local functionality for resource discovery across multiple sources of data. What all this means, of course, is that the library catalogue functions as just one of many access paths available to the user in search of electronic resources on the network.

Positioning the library catalogue as a primary access mechanism within this environment will require a strategic focus not only on the technologies that are being broadly deployed throughout the network, but also on those aspects of the catalogue that are integral to its design and serve to differentiate it from other access mechanisms. One such element is the cataloguing process. The value of the catalogue as an access mechanism is derived in large measure from the quality control inherent in the data creation process-in the consistent application of descriptive standards, the control of name and title access points through authority files, the development of subject thesauri and classification schemes, and the standardization of formats and coding for machine-readable records. Added value is derived as well from the wide-scale adherence to cataloguing standards within the library sector, which means that in the aggregate library catalogues have the potential to function effectively as an integrated access mechanism to an enormous store of resources.

Equally important from a strategic perspective is the fact that the library catalogue functions as a guide to a collection of resources professionally searched, selected and maintained for the purpose of supporting the research and information needs of a defined community of users. With the exponential growth that characterizes the Internet, the selectivity and pre-determination of relevance that are reflected implicitly in the library catalogue take on even greater value. The library catalogue also differs from many of the newer access mechanisms on the network in that it is has a retrospective as well as a current dimension to its design and function. The fact that as an access tool the library catalogue, like the library collection itself, has an archival function is of critical importance in a networked environment so widely evanescent in nature.

Setting the agenda for the adaptation and development of the library catalogue to function more effectively in a networked environment is in itself a challenging task. Clearly there is a need to exploit new technologies as fully as possible. Likewise, there is an increasing need to factor cross-domain interoperability into the equation. But there is also a need to retain and enhance to the extent possible those features of the catalogue that have served over time to make it an effective tool for its users and that give it the potential to outperform other resource discovery tools in this new environment.


