Library of Congress
A brief historical overview of Z39.50 is provided as context for
discussion of Z39.50 recent developments and future prospects.
Although the historical events leading to the development of Z39.50 are
sometimes tracked back to the 1960s, momentum to standardize an information
retrieval protocol began to sharpen in the early 1980s with the beginning of
the Linked Systems Project, LSP, whose implementation began In 1982, and which
became operational in 1985. The participants were the Library of Congress,
RLG, and OCLC.
The essence of LSP was the Authorities application: the establishment and maintenance of a nationwide database of name authority records. Two application level protocols were developed: Record Transfer and Information Retrieval. The primary function of the authorities application was the transfer of the authority records between systems. supported by the Record Transfer protocol. A background function was the intersystem searching of authority records, supported by the Information Retrieval protocol.
Both the Record Transfer and information retrieval protocols were developed to support authority record exchange, but were intended to support record exchange and intersystem searching regardless of record type.
In 1983 the LSP participants submitted both protocols, Record Transfer and Information Retrieval, for consideration as American National Standards. For Record Transfer, attempts to standardize were eventually abandoned (and ultimately, the Record Transfer protocol itself was replaced by FTP).
There was however substantial interest within the U.S. in standardizing an information retrieval protocol, and the LSP Information Retrieval protocol was submitted to ANSI/NISO, who formed a committee that prepared it for ballot, in 1984, when it was given the designation "Z39.50", as it is known today. (NISO was formerly named Z39, and continues to use that designation for its standards.) The 1984 ballot failed within NISO, for reasons beyond the scope of this paper (primarily because it was not yet sufficiently well- developed). There was significant further development over the next three years; Z39.50 was re-balloted in 1987, this time successfully, and was approved by ANSI in 1988.
Independently, in 1984, a work item was approved in ISO for a "Search and Retrieve" protocol, called SR. There were several drafts of the SR standard between 1984 and 1991 when it was finally approved. As difficult as it was to achieve consensus on Z39.50 in the U.S., it was more difficult to achieve international consensus on SR, because of the various conflicting national interests represented. Of course the U.S. input was influenced by Z39.50, which was not entirely stable during the period of SR development. The result was that several incompatibilities remained between SR and the 1988 version of Z39.50.
GILS, the Government Information Locator Service, is a response to the need for users to identify and locate publicly available Federal information resources. The GILS Profile provides the specifications for the overall GILS application, including the GILS "Core" data elements that comprise a GILS record describing an information resource, and the use of Z39.50 to search and retrieve GILS records.
The "Author-Title-Subject" profile aims to improve the reliability of
Z39.50 search results. When a client requests, for example, an author search,
the intent of the ATS profile is that the server will execute the search based
on its concept of author. If the server does not support an author search, it
should not re-cast the search, substituting some attribute other than author,
without the client's knowledge and consent. Neither should the server treat
the inability to perform a search as a successful search with no results.
The profile specifies the use of bib-1 within a type-1 query for searching by author, title, or subject, to provide basic search access to bibliographic databases.
The WAIS (Wide Area Information Servers) profile specifies rules for access to WAIS servers supporting Z39.50 version 2.
In August, 1995, the Library of Congress convened a team of
representatives from several institutions to develop a Z39.50 profile for
access to digital libraries. Participating organizations included Getty,
Berkeley, University of Michigan, University of California, OCLC, LC, RLG,
Chemical Abstracts Service, IBM, FCLA, TRW, Knight Ridder, SilverPlatter, as
well as consultants and liaisons.
The scope was narrowed to apply to navigation of digital collections, and was named the Z39.50 Profile for Access to Digital Collections (Collections Profile). The larger problem of access to digital libraries was left to the province of other profiling efforts, including CIMI and the Digital Library Object profiles described below. Other groups were initiating independent efforts to develop profiles aimed at specific types of objects and collections. The intention was to coordinate these efforts and that these latter profiles would be developed as compatible extensions or subsets of the Collections profile.
The profile aims to address the problem faced by libraries and other institutions who create collections, organized thematically -- by subject, creator, historical period, etc.-- with numerous, diverse objects, both digital and physical. These collections are often organized hierarchically and distributed across servers. Significant resources may be invested in digitization and in the intellectual efforts of aggregation, organization, and description of the information in a collection. Yet to a remote user or client, the collection may appear to be simply an accumulation of objects and undifferentiated data, because there is no agreed-upon semantics for navigating the collection, to locate and retrieve objects of interest. Coherent organizational structures, imposed on the data, are necessary to provide a view that supports navigation.
A key obstacle to effective navigation is the inability to distinguish content from description. A primary goal of navigation is to locate and retrieve objects of interest; a vital step in that process is to locate relevant descriptive information. Thus it is useful to navigate among descriptive information as well as content, and consequently, to be able to distinguish content from description.
The profile exploits organizational structures to allow a client to navigate through structured information. A coherently defined set of descriptive data is used to manage and navigate collections of otherwise undifferentiated data. These organizational structures allow the data to be viewed as distributed, hierarchical collections. The objectives of the profile are to:
The Consortium for the Computer Interchange of Museum Information (CIMI)
has supported the development of a Z39.50 Profile as part of its current
Project CHIO (Cultural Heritage Information Online), for access to museum
Museum information includes a variety of physical and electronic objects, including physical artifacts and electronic derivatives, descriptive records designed for collection management, full-text documents, and online tools such as thesauri and authoritative lists of artists' names.
A digital collection of museum information needs to address not only the heterogeneous nature of the information objects but also the fact that such a collection will draw upon repositories of museum information distributed around the world.
CIMI initiated Project CHIO as a demonstration project to investigate a standards-based approach for searching and retrieving cultural heritage information from disparate and distributed information systems containing museum information. Project CHIO consists of two interrelated demonstration projects -- CHIO Structure and CHIO Access -- to show respectively the utility of SGML and Z39.50, to enhance electronic access to cultural heritage museum information in a distributed, networked environment.
Museum information includes physical and electronic objects -- physical artifacts and electronic derivatives of those artifacts, descriptive records designed for collection management, full-text documents, online tools such as thesauri and authoritative lists of artists' names, and more.
CIMI initiated Project CHIO as a demonstration project to investigate a standards-based approach for searching and retrieving cultural heritage information from distributed information systems containing museum information. Project CHIO consists of two interrelated demonstration projects -- CHIO Structure and CHIO Access -- to show respectively the utility of SGML and Z39.50, to enhance electronic access to cultural heritage museum information in a distributed, networked environment.
"CHIO Structure" uses SGML to mark up museum objects including (text) exhibition catalogues and wall text, and make them available for electronic access. "CHIO Access" demonstrates the utility of Z39.50 to access digitized museum objects.
Digital Library Objects
The Z39.50 Profile for Access to Digital Library Objects (DL
Profile) addresses functional and user requirements for search and retrieval
of information in digital library collections, specifically the Library of
Congress digital library collections and similar collections.
The profile provides a general and flexible model for the structure of a digital object. In the model, a digital object may consist of constituent parts, any of which may in turn consist of constituent parts, and so on. Consider, for example, a single digital object consisting of several images (e.g. photos or text images). Although the set of images comprises a single digital object, each must be distinctly representable and the object must convey the fact that there are distinct images, how many, and their individual characteristics. Thus they are represented as separate elements of a Z39.50 record.
Next suppose that the digital object not only includes a number of images, but also additional constituent parts, further structured; for example, each such constituent part may consist of several images. This introduces an intermediate level of aggregation. The model of a digital object adopted by the DL profile assumes arbitrary levels of aggregation and is represented as a tree, where each non-leaf node has an arbitrary number of subtrees and/or leaves, and leaf nodes represent data.
Every node, whether a leaf or non-leaf node, may have metadata attached, including description, date of creation, terms and conditions, etc.
This model will support, for example, a digital object representing 10 boxes, each with 20 folders, each with 30 photos. Z39.50 string tags such as 'box', 'folder', and 'photo' could be used to convey the type of element. As a more complex example, a folder might include a variety of photos, maps, correspondences, etc. and perhaps the correspondences consist of several sequential digitized pages.
CIP - the Catalogue Interoperability Protocol - addressed the ability to
effectively exploit earth observation and associated data resources. That
capability is impeded by the lack of homogeneity in services and interfaces
offered by various data providers. CIP is being developed by the Protocol Task
Team within the Committee on Earth Observation Satellites (CEOS).
CEOS provides coordination between international Earth observation missions and encompasses various national (civil) agencies involved in Earth Observation satellite programmes: the European Space Agency, NASA, DLR (Germany), NASDA (Japan), DDRS (Canada), BNSC (UK), and CEO (Centre for Earth Observation).
The objective of CIP is to enable users to logically search physically distributed data catalogues, without separately querying each and merging/correlating result sets, effectively allowing the various data archives to appear to be a single database. It includes a data dictionary to specify the common attributes that describe the primary objects within a catalogue system.
CIP models collections, permitting complex hierarchical groupings of data organized thematically over multiple databases, where both the collections and the individual collection members (objects and subcollections) have item descriptors, roughly analogous to the descriptive records defined by the Collections profile.
A service named WORLD 1 will be offered by the National Library of
Australia to replace the current Australian Bibliographic Network and Ozline
services. The technical infrastructure to operate the WORLD 1 Service is being
developed as a joint venture by the National Library of Australia and the
National Library of New Zealand under the banner of the National Document and
Information System (NDIS) Project.
The plan is to use union catalogues as tools for the identification of resources and their location, in a geographic area. The premise is that union catalogues with good coverage and authority control are still an attractive concept because of the limitations of multi-target searches, with performance degradation (for searches over several targets), where results are not well integrated, with duplicate records, and multiple versions of headings (e.g. author and subject).
Libraries contributing to a union catalog would require a cataloging system to update both their own local catalog and the union catalog in a single operation, and the project proposes to integrate the "cataloguing protocol" with Z39.50. To this end, they propose to use Z39.50 both for search and update, and they are profiling the Z39.50 Update Extended Service.
The Z39.50 Profile for STARTS (ZSTARTS) stems from the
Stanford Protocol for Internet Search and Retrieval (STARTS), an
initiative of the Stanford Digital Library Project. The STARTS project brought
together a number of commercial companies to develop requirements for
distributed searching and ranked retrieval. The ZSTARTS profile is a Z39.50
solution to these requirements.
The STARTS model assumes document databases; a client sends a query to multiple servers, where the query includes a filter and ranking expression. The filter is analogous to the Z39.50 type-1 query (i.e. a boolean query); while the ranking expression supplies guidance for the server to rank results -- the client may assign weights to individual terms. The STARTS model calls for the merging of the ranked results from the various servers.
Search results include document metadata: title, publication date, size, score (assigned to the document for the given search), occurrence information (pertaining to the terms in the query) and a pointer (url) to the document for subsequent retrieval.
The Type 102 Ranked List Query (RLQ) was originally intended to be developed as a natural language query, but it was deemed impossible to design a query that adequately supports all of the natural language search methodologies. Type 102 RLQ has instead been designed for the ranked searching technologies used by large-scale commercial information providers and information industry software vendors, several of whom have participated in the development of this query, including:
Proposed SQL Query
The Distributed Database Unit at CRC for Distributed Systems Technology Centre at the Department of Computer Science, University of Queensland, is proposing changes to Z39.50 to support SQL databases. Their proposal includes: