Z39.50 Attribute Architecture

Draft for Final Review

December 1998
Revised January 12 (miscellaneous changes based on comments)
Revised January 13, 1999 (see last part of 3.1.3 and new section 3.1.4, and annotations in 3.2.5)

Please review and comment by January 15, following which, version 1 will be issued. Mail comments to [email protected] or to the ZIG List no later than January 15.

1. Introduction and Preliminary Notes

1.1 Historical Background

The initial attributes for the bib-1 attribute set were developed by representatives of the Library of Congress, RLG, OCLC and WLN in the mid-1980s. This U.S. set was merged with a similar set from European library system developers to become bib-1. It was the only attribute set contained in the published version of Z39.50-1992 (version 2).

Problems with the bib-1 attribute set began to surface at that time. Within the bibliographic community, implementors had no published definitions of the bib-1 attribute semantics, thus each vendor implemented the bib-1 attribute set with their own interpretation of the attribute usage. A document was produced to clarify this (see ftp://ftp.loc.gov/pub/z3950/defs/bib1.txt), although it was never formally included as part of the standard.

As the Internet grew, more communities wanted to implement Z39.50 and, in turn, needed additional attributes (beyond those already in bib-1) to reflect the types of data they wanted to exchange. This proved difficult as Z39.50-1992 did not allow a query to include attributes from more than a single attribute set. Since bib-1 was the only publicly visible set, it was expanded to accommodate the needs of these communities. Thus, bib-1 grew without plan or rigor, evolving away from the bibliographic community where it had started, and "bib-1" became somewhat of a misnomer as it grew into a global set of attributes.

In 1994 and 1995, as Z39.50 version 3 was being finalized and as Z39.50 began to be widely implemented, additional concerns arose over the relationships among attribute sets that other groups were developing, notably the STAS and GILS attribute sets. The Z39.50 Implementors Group (ZIG) had many questions about the development and implementation of multiple attribute sets, including duplication of attributes across sets. In early 1996 ZIG a discussion paper by Cliff Lynch (Defining and Maintaining Attribute Sets for Use with the Z39.50 Protocol: A Discussion Paper ) detailed the issues:

  1. Duplication of common attributes in specialized attribute sets, due to the limits of the version 2 query.
  2. Interoperability problems due to attribute set proliferation, for example, how to know which basic attributes were imbedded in specialized sets.
  3. Ambiguities in the semantics of attributes.
  4. Lack of rigorous semantics in the bib-1 attribute set; lack of a scope statement for the bib-1 attribute set; lack of consultation with the broad community concerned with bibliographic records.
  5. Lack of guidance about the semantics of mixing attributes from different attribute sets in a single Z39.50 query (and in particular, in a single query operand).
Following discussion of these issues at the February 1996 ZIG meeting, Lynch volunteered to bring together a group of interested people to recommend resolutions of the issues. The group met three times. Lynch prepared interim reports that were discussed at subsequent ZIG meetings. The final report of the group was presented at the January 1998 ZIG meeting. The current text of the new architecture includes revisions based on discussions then and at the June 1998 ZIG meeting.

The major conclusion of the group was that a new architecture for attribute sets should be developed; they went on to recommend an architecture based on classes of attribute sets, with expanded attribute types. Another major conclusion was that expert communities, rather than the ZIG, should be responsible for developing and maintaining attribute sets (following the example set by GILS and STAS). Notably, they recommended that the bibliographic community, rather than the ZIG, develop the next generation of bibliographic attributes. The ZIG should continue to be responsible for attributes that are general to Z39.50, that is, not specific to a given community.

1.2 Brief Technical Background

Z39.50 defines a number of query types, and requires support for the type-1 query (support for other defined query types defined is optional). This document addresses the Z39.50 type-1 query only.

The type-1 query consists of one or more search terms, each with a set of attributes, specifying, for example, the type of term (author, title, subject, etc.), whether the term is truncated, its structure, etc. The server is responsible for mapping attributes to the logical design of the database.

A term in a type-1 query, together with its accompanying collection of attributes, is called an operand. Operands may be combined in a type-1 query, linked by boolean operators (And, Or, And-not, and Proximity).

Each attribute is a pair: an attribute type and a value of that type. An Attribute set defines a set of attribute types, and for each type defines the set of possible values.

An attribute set definition is assigned an object identifier, referred to as its attribute set identifier.

Example: The bib-1 attribute set defines a number of attribute types; one of which is Use. For bib-1 Use attributes, many attribute values are defined, one of which is personal name. Each type is assigned a numeric value, and each value is assigned a numeric value: type Use is assigned the value 1, and Use attribute Personal Name is assigned the value 1. Thus bib-1 Use attribute Personal Name is represented as the pair (1,1). This pair is further qualified by the bib-1 attribute set identifier (1.2.840.10003.3.1) to distinguish it from the pair (1,1) that may be defined by another attribute set.

Version 2 of Z39.50 has two serious limitations inhibiting the development of attribute architecture, both corrected in version 3:

1.3 Limitations and Restrictions

1.3.1 Version 3 Assumption

There are several enhancements in version 3 pertaining to attribute sets and query construction; the two enhancements described at the end of 1.2 are certainly the most important, and are seen to be functional prerequisites for the development of an attribute architecture. For this reason, version 3 is assumed by this architecture, and version 2 is not addressed.

1.3.2 Type-1 Query Limitation

The Z39.50 type-1 query has known-limitations, and the architecture specified in this document is restricted by these limitations. As the standard evolves and new versions are approved, the architecture may be expanded. See section 4: "Lessons Learned: Recommendations for Future Enhancements to the Z39.50 Query".

1.3.3 Semantic Indicator

In order to compensate for some of the type-1 limitations, it may be necessary to utilize the semantic indicator (provided within version 3) for purposes that would otherwise be accomplished by more coherent mechanisms if these limitations were not present. It should be thus noted that in future versions of Z39.50 it is intended that these limitations will be addressed, obviating the need for extensive use of the semantic indicator at the attribute level.

2. Attribute Set Class Definitions

The attribute architecture allows definition of multiple attribute set classes. An attribute set class provides an umbrella context for the definition of an attribute set belonging to a particular class. It defines attribute types that may be included in an attribute set for that class. Attribute set Class 1 is defined as part of this architecture document (section 3).

This architecture strongly recommends that an attribute set definition that conforms to a particular class but defines attribute types that are not defined for that class should carefully define the interactions between the new attribute types and existing types defined for that class.

The architecture provides the attribute-set-class approach to allow flexibility and future expansion within the existing architecture. It is believed that attribute set Class 1 meets all known needs for an attribute class at this time. There may be other approaches developed which partition the set of attributes into fundamentally different types. This might result in the definition of a new attribute class inconsistent with Class 1. However, no need for such a separate class has been identified and it is not known whether additional classes will be necessary.

2.1 Attribute Values

These rules for construction of attribute values pertain to all classes. An Attribute set may define the set of values for a particular attribute type as follows:
  1. The attribute set definition may supply a finite list (where individual members of the list may be numbers or character strings) where a value for the attribute for that type is to be supplied from the list.
  2. The attribute set definition may define the type as numeric. For example, the value of an 'occurrence' attribute may simply be the actual occurrence, that is, to indicate "second occurrence of field N" the value of the Occurrence attribute would be 2.
  3. The attribute set definition may specify that a locally defined value, either a number or string, may be used as the value of the attribute for that type.
  4. The attribute set definition may specify that the attribute may take on a sequence of values, where each is any of the above (1, 2, or 3).
An Attribute value in an operand may thus be a number, string, or sequence of numbers and strings. A number value might take the role of 1 or 2 above, and a string value might take the role of 1 or 3; in each case, the role is interpreted by the attribute set definition.

3. Attribute Set Class 1

This class is intended to cover all known, existing needs, at the time that this document was finalized. (Existing attribute sets may need to be re-specified within this framework.)

The purpose of enumerating all of the possible attribute types within this "universal" attribute class is to provide a template for developers of attribute sets, and to set up a framework for interoperability among independently defined attribute sets which are intended to serve various communities. In particular, it should be possible for groups of content experts to develop new Access Point attributes, ASN.1 datatypes, comparison operators, and perhaps format/structure attributes which fit comfortably within this framework. Server developers can, based on the template defined here, recognize various attribute types that are omitted in a given query, as well as illegal repetitions or combinations of attributes of given types that would indicate a malformed query.

3.1 General Rules for Class 1

3.1.1 Semantic Precedence and Interaction among Sets

The context of this attribute class is in effect for a query when the OID of an attribute set conformant with this class is specified as the global OID (the object identifier within the type-1 query that does not accompany a specific attribute). For Class 1, the global OID is referred to as the dominant OID for the query. When attributes from different attribute sets are mixed within a query, and when the respective attribute set definitions conflict such that the resulting semantics are ambiguous, the semantics of the dominant set prevail. As an example, suppose attribute set definition A declares that the Language type is mandatory in an attribute list, while attribute set definition B declares it to be optional. If attribute set A is used as the dominant set for a query, then the Language attribute would have to be supplied within every operand; if attribute set B is the dominant set, it would not.

When an attribute set is intended to conform to Class 1, its definition should:

Interaction between attribute sets conformant to this attribute set class and historical attribute sets not conformant to this class within a query operand are undefined.

3.1.2 Populating Class 1 Attribute Sets

An attribute set consistent with this attribute class will define attributes of one or more of the types specified in 3.2.

Any Class 1 attribute set follows the rules prescribed for Class 1 that apply to attribute types defined for that set. However, a Class 1 attribute set need not define nor populate every attribute type defined for Class 1. A Class 1 attribute set may define as few as one attribute type, or as many as all of the attribute types defined for Class 1. Thus no specific attribute type is mandatory in the sense that it must be included in an attribute set definition. (Note: this use of "mandatory" is contrasted with the use of "mandatory" to mean that a particular attribute type may not be omitted in an operand, as for example, the Comparison attribute type.)

However, a Class 1 attribute set must use the numeric values in the "Type Number" column in the table in section 3.3, to represent the types; if any of these types is omitted in the attribute set definition, the definition should skip the value for that type rather than renumber.

An attribute set might be developed for an application or profile and may refer to values of a particular attribute type that are defined by a different attribute set. If all of the values required for the application are defined by that other attribute set, then that attribute type need not be defined for the new set.

There may often be a close relationship between the development of a profile for a particular application, and the development of an attribute set definition to support the application. The profile might refer to several attribute sets in describing how to construct query operands (or entire queries). Thus the attribute set definition is not, itself, responsible for specifying all of the details of searching for the application when those details involve attributes from different attribute sets; however, the attribute set may offer as much commentary as it deems necessary and appropriate, for example, it may explain why a particular attribute has been omitted from its definition (because another attribute has defined it). It might explain how certain attributes that are defined in the set are to be combined with attributes from other sets.

3.1.3 Omitted Attributes

An attribute set definition should not specify a default value for an attribute type to be applied when that attribute type is omitted from an operand. Each individual server may determine the semantics of omitted attributes. Thus when a client omits an attribute of a given type from an operand (unless that type is not applicable for the given attribute combination, or unless the attribute type is mandatory) the client is, in effect, leaving it to the server to select a value. See also, "Omitted Attributes in Conjunction with Nested Access Point Attributes".

Temporary Editor's Note: The remainder of this section (3.1.3), added January 13.

The presence or absence of any attribute should not imply the presence of any other attribute, whether of the same or a different type. (For example, the presence of an Access Point attribute should not imply the presence of an otherwise omitted Format/Structure attribute, even if the relationship seems obvious.)

3.1.4 Syntactic Content of Search Term

Temporary Editor's Note: This section, 3.1.4, added January 13.

A query operand should be constructed such that the server may determine the syntactic content of the search term based on the ASN.1 datatype of the term as well as the Format/Structure attribute, if supplied (and if the Format/Structure attribute is not supplied, by the ASN.1 datatype alone). In general, the value of the Access Point attribute should not contribute to this determination.

Even in cases where there is only one legal value of a Format/Structure attribute, and when the client might expect the server to deduce that value, it should be explicilty supplied. An exception is when the ASN.1 type completely and unambiguously determines the format, for example when the ASN.1 type is INTEGER or GeneralizedTime, or when the Z39.50 Date/Time defintion is supplied (as EXTERNAL); in these cases the Format/Structure attribute may be omitted. ASN.1 type InternationalString would not be considered to unambiguously determine the format.

An attribute set developer should determine all of the Format/Structure attributes values neccessary to fully specify the term formats relevant to the attribute set, and for each, either include it as a Format/Structure value in the attribute set definition, or ensure that it is defined in another attribute set (and provide appropriate reference within the attribute set definition).

3.1.5 Repeatability

In general, if an attribute type is allowed to be repeatable, the semantics of repeating the attribute type must be well-defined.

While repeatability may be permissible for a given attribute type, as a general principle, an attribute type should not be repeated as a substitute for Boolean operations. To amplify this point, an attribute definition might prescribe how to interpret, for example, multiple Access Point attributes in a single operand. The definition might prescribe:

Note: the above three examples are for illustration only. There may be other possible interpretations for multiple Access Point attributes.

The definition may include a semantic indicator, allowing a client to select among several semantic alternatives. However, none of those alternatives should be to construct separate operands (linked by boolean 'and' or 'or') for each Access Point attribute -- the type-1 query supports boolean operations, so allowing another means of specifying boolean operations would add unnecessary complexity (in contrast to potential semantic interpretations of multiple Access Point attributes which cannot be otherwise represented via the type-1 query, as in the examples above). Mechanism for Repeating Attributes

There are two mechanisms supplied by the Z39.50 standard for providing multiple attributes of the same type within an operand:
  1. Via 'list' within 'complex' CHOICE of 'attributeValue' within AttributeElement; defined in section 4.1 of Z39.50-1995, Abstract Syntax and ASN.1 Specification of Z39.50 APDUs. (This mechanism is provided by version 3, and not supported in version 2.)
  2. Via separate instances of AttributeElement.
Although Z39.50 provides both of these mechanisms, the first mechanism is prescribed for this class.

3.2 Attribute Types Defined within the Attribute Class

3.2.1 Access Point Attribute Type

The Access Point attribute defines either an intellectual access point (for applications that work with abstract database definitions) or an access point corresponding to a database fieldname (for applications where searching is defined in conjunction with a specific database schema, or defined to correspond to a specific Z39.50 tag set).

An attribute set definition that defines this type should include a discrete list of values. Nesting and Anchoring of Access Point Attributes

Nesting of Access Point attributes may be supported by an attribute set definition. If so, nesting should be indicated by repetition of the Access Point attribute type (as prescribed in, where the order of nesting is as in the following example: field 1, field 2, and field 3, supplied in that order, means "field 3 within field 2 within field 1". An example of the use of nesting might be a field path within an SGML database.

An Access Point attribute may be indicated as not anchored (matching may occur beginning at any node within the element tree) by nesting it within a Access Point attribute of value 'wildpath' (for example as defined in the Utility attribute set). In the absence of a wildpath attribute, it is considered anchored (matching must occur from the root of the element tree).

Example of Anchored vs. Not anchored:
Suppose a schema includes elements Description, Contact, and Availability, where Description is unstuctured, Contact is structured into sub-elements Name, eMail, and Description, and Availability has subelement Contact, similarly structured: When the single Access Point attribute Description is specified as anchored, then it is intended to match first-level element Description; if multiple Access Point attributes Contact and Description are specified, as anchored, then it is intended to match Description within first-level element Contact. If Contact and Description are specified, as not anchored, then it is intended to match Description within first-level element Contact, or Description within Contact within Availability. If the single Access Point attribute Description is specified as not anchored, then it may match first-level element Description, Description within first-level element Contact, or Description within Contact within Availability. Mixing Access Point Attributes from Multiple Attribute Sets

Mixing, within an operand, Access Point attributes from multiple attribute sets is permissible. Attribute sets might be defined that correspond directly to tag sets (which define Z39.50 retrieval elements). It may be desired to search on a field that corresponds to an element defined by a retrieval schema. A type-1 query operand might correspondingly be constructed with nested Access Point attributes corresponding to the elements in the tag path for the desired field. It may be that those elements are from different tag sets. Correspondingly, the Access Point attributes would belong to different attribute sets. Omitted Attributes in Conjunction with Nested Access Point Attributes

When an attribute type is omitted, and when nested access points are specified (via multiple Access Point attributes values), the server will choose values for the omitted type based on the most specific access point in the list. For example, when searching field-1 within field-2, and the language attribute is omitted and the server must then select one, it should select it based on field 1, not field 2.

3.2.2 Qualifying Attribute Types

3.2.3 Query Management Attribute Types

These attributes have the property that they can be rewritten by the server as part of a revised query that the server returns to the client.

3.2.4 Comparison Attribute Type

The Comparison attribute defines the relationship between the term in the operand and the term in the term list at the server.

The presence of a Comparison attribute is mandatory in an operand, as it is presumed that there is always a relationship between the term and the value of the access point to which the term is compared (otherwise there would be no basis for comparison) and that the client knows the relationship; therefore, based on the principle stated in 3.1.3 ("Omitted Attributes") the client should always supply the relationship.

The Comparison attribute is a generalization of the bib-1 Relation attribute, though named differently to avoid confusion. (The bib-1 Relation attribute is not mandatory in bib-1, as bib-1 has no such rules of occurrence, nevertheless, there is always a relationship, implied or explicit. One of the problems with bib-1, that this class tries to correct, is the potential ambiguity when the relationship is not supplied.)

An attribute set definition that defines this type should include a discrete list of values. This attribute is non-repeatable.

Sample values might include:

3.2.5 Format/Structure Attribute Type

This attribute is used primarily to help with the interpretation of a character-string term in cases where the Comparison attribute normally does not assume an ASN.1 datatype; it provides guidance for the datatype conversion process.

This attribute type should normally be omitted when the term value has strong datatyping. For example, Integers, intUnits and Object Identifiers may be directly expressed as such via the Z39.50 ASN.1 definintion of Term. This applies also to Externals that evaluate to scalar quantities.

Developers of specific Access Point attributes should consider defining (or utilizing existing) ASN.1 datatypes to support their applications -- for example, personal names, dates, geospatial information (points and polygons). There will of course be cases where the ASN.1 approach to datatyping will be too heavy-weight; in those cases the Format/Structure attribute type can be used in conjunction with ASN.1 type InternationalString to indicate that the content of a string represents data in a specific format.

Temporary Editor's Note: The following sentence added January 13.

However, a character string term should not be used to represent a number (e.g. to represent the integer 123, the term should assume ASN.1 type INTEGER, rather than the character string '123'.)

Personal names are an interesting boundary case where one might argue either for an ASN.1 based definition or a Format/Structure attribute indicating a normalized name according to some rules; the choice of the appropriate approach is best left to a bibliographic-attribute-definition working-group.

An attribute set definition that defines this type should include a discrete list of values. This attribute is non-repeatable. Dates

A date/time value might be expressed in any of the following forms:
  1. ASN.1 type GeneralizedTime,
  2. Via the Z39.50 ASN.1 Date/time definition,
  3. Via some other EXTERNAL definition for date and/or time,
  4. ASN.1 type InternationalString. In case (4) a Format/Structure attribute should accompany the term, indicating for example, that the term is a normalized date. For (1)-(3) no Format/Structure attribute should be supplied. Character String

    A term which is to be treated as a literal character string, or as a word-oriented phrase subject to pre-processing by the target, should be accompanied by the Format/Structure attribute 'Character String'. Whether and what type of pre-processing applies should be indicated by an Expansion/Interpretation attribute.

    Whenever the 'Character String' Format/Structure attribute is supplied, the order of the words in the supplied term is to be preserved (when preprocessing applies).

    Whenever the 'Character String' Format/Structure attribute is supplied, the Term should be represented as ASN.1 type InternationalString. Set

    A Term may be recursively defined to be comprised of several component terms (where the term is represented as ASN.1 type EXTERNAL, using Multiple Search Term Format: MultipleSearchTerms-1.) In the case where the aggregate term is to be treated as a set, for purposes of set comparison (e.g. "subset of", "superset of", "set equality") a Format/Structure attribute value 'Set' should be supplied.

    3.2.6 Occurrence Attribute Type

    The value if this attribute is the desired occurrence of an access point. For example "second occurrence of field-1". This is a non-repeating, numeric attribute.

    3.2.7 Indirection Attribute Type

    The presence of this attribute Indicates that the actual content of the term is not supplied, but instead, a pointer (e.g. url) to the term is supplied in lieu of the actual term. An attribute set definition that defines this type should include a discrete list of values; e.g. URL, URN. This attribute is non-repeatable.

    3.3 Enumeration and Summary of Class 1 Attribute Types

    The table below enumerates and sumarizes the Class 1 Attribute types.

    An attribute set definition must use the numeric values in the "Type Number" column below to represent the types. If any of these types is omitted in an attribute set definition, the definition should skip the value for that type rather than renumber.

    In the "value" column, 'list' means that when an attribute set defines that type, the attribute set definition should include a discrete list of values for the type.

    In the Repeatable column, if the value is "yes" (meaning that the attribute type is repeatable) an attribute set definition may declare the type to be non-repeatable, but if the value is "no", an attribute set may not declare the type to be repeatable. (When an attribute set definition declares a type non-repeatable, this means that the attribute type may not repeat within any operand of a query, when the attribute set is specified as the dominant set for the query.)

    In the Occurrence column, "Mandatory" means that the attribute type must occur in any operand (it does not mean that a given attribute set must define that type). "Optional" means that in general the attribute type need not occur in every operand; however, a specific attribute set definition may declare that the attribute is mandatory (or mandatory in certain circumstances) in which case, the rules specified would be in effect when the attribute set is specified as the dominant set for a query. An attribute set definition may not declare an attribute type to be optional if it is listed as "mandatory" in this table.

    Attribute TypeType NumberValue RepeatableOccurrenceRoughly-corresponding Bib-1 Type
    Access Point1list yesMandatory Use
    Semantic Qualifier2list yesoptional (new)
    Language3list yesoptional(new)
    Content Authority4list yesoptional(new)
    Expansion/Interpretation5list yesoptionalTruncation and some of Relation
    Normalized Weight6numeric nooptional(new)
    Hit Count7numeric nooptional(new)
    Comparison8list nomandatoryRelation and part of Completeness
    Format/Structure9list nooptionalStructure
    Occurrence10numeric nooptional(loosely) Completeness
    Indirection11list nooptional(new)

    3.4 Attribute List Construction

    Within a properly constructed operand, the attribute list within an operand should:

    3.5 Utility Attribute Set

    A Utility attribute set will be developed and maintained, consistent with Class 1, that will include commonly used (non-domain-specific) values for all of the Class 1 types.

    4. Lessons Learned: Recommendations for Future Enhancements to the Z39.50 Query

    As a result of the deliberations over this architecture, limitations posed by the type-1 query have resulted in identification of recommended enhancements that should be considered for a future version of Z39.50. These are documented here (additional contributions to this list are welcome):
    Library of Congress