Z39.50 Attribute Architecture

August 1998

1. Introduction and Preliminary Notes

1.1 Historical Background

The initial attributes for the bib-1 attribute set were developed by a team of representatives of the Library of Congress, RLG, OCLC and WLN in the mid-1980s. This set was merged with a similar set that had been developed by European library system developers to become bib-1. Bib-1 was the only attribute set contained in the published version (version 2) of Z39.50, in 1992.

Two problems surfaced about this time. There had been no publicly available written definitions of the semantics of the bib-1 attributes, so implementers were unclear about how to support them. Since this was the only visible set, and since version 2 did not allow a query to use more than a single attribute set, implementers wanted to add whatever attributes they needed to the bib-1 set. Thus the bib-1 set evolved away from the bibliographic community where it had started, and it grew without plan or rigor until it was viewed as a kind of global set. A document detailing the semantics was updated in 1995, but it is not part of the standard.

In 1994 and 1995, as Z39.50 version 3 was being finalized and as Z39.50 began to be widely implemented, the Z39.50 Implementors Group faced questions about the relationships among attribute sets that other groups were developing, notably the STAS and GILS attribute sets. Clifford Lynch prepared a discussion paper for the February 1996 ZIG meeting that detailed the issues:

  1. Duplication of common attributes in specialized attribute sets, due to the limits of the version 2 query.
  2. Interoperability problems due to attribute set proliferation, for example, how to know which basic attributes were imbedded in specialized sets.
  3. Ambiguities in the semantics of attributes.
  4. Lack of rigorous semantics in the bib-1 attribute set; lack of a scope statement for the bib-1 attribute set; lack of consultation with the broad community concerned with bibliographic records.
  5. Lack of guidance about the semantics of commingling attributes from multiple attribute sets in a single Z39.50 query.
Following the discussion of these issues at the ZIG meeting, Lynch volunteered to bring together a group of interested people to recommend resolutions of the issues. The group met three times. Lynch prepared interim reports that were discussed at subsequent ZIG meetings. The final report of the group was presented at the January 1998 ZIG meeting. The current text of the new architecture includes revisions based on discussions then and at the June 1998 ZIG meeting.

The major conclusion of the group was that a new architecture for attribute sets should be developed; they went on to recommend an architecture based on classes of attribute sets, with expanded attribute types. Another major conclusion was that expert communities, rather than the ZIG, should be responsible for developing and maintaining attribute sets (as was the case with GILS and STAS). Notably, they recommended that the bibliographic community, rather than the ZIG, develop the next generation of bibliographic attributes. The ZIG should continue to be responsible for attributes that are general to Z39.50, that is, not specific to a given community.

1.2 Acknowledgements

The following people attended one or more of the attribute architecture meetings:
Ed. Note: List of meeting participants and other significant contributors to be listed here.

1.3 Brief Technical Background

Z39.50 defines a number of query types, and requires support for one of those types, the type-1 query. This document addresses the Z39.50 type-1 query only.

The type-1 query consists of one or more search terms, each with a set of attributes, specifying, for example, the type of term (author, title, subject, etc.), whether the term is truncated, and its structure. The server is responsible for mapping attributes to the logical design of the database.

A term in a type-1 query, together with its accompanying collection of attributes, is referred to as an operand. Operands may be combined in a type-1 query, linked by boolean operators (And, Or, And-not, and Proximity).

Each attribute is a pair: an attribute type and a value of that type. An Attribute set defines a set of attribute types, and for each, a list of possible values.

An attribute set definition is assigned an object identifier, referred to as its attribute set identifier.

Example: The bib-1 attribute set defines a number of attribute types; one of which is Use. For bib-1 Use attributes, many attribute values are defined, one of which is personal name. Each type is assigned a numeric value, and each value is assigned either a numeric value or a string. In bib-1, type Use is assigned the value 1, and Personal Name is assigned the value 1. Thus bib-1 Use attribute Personal Name is represented as the pair (1,1). This pair is further qualified by the bib-1 attribute set identifier (1.2.840.10003.3.1) to distinguish it from the pair (1,1) that may be defined by other attribute sets.

In version 2 of Z39.50, all attributes within a query must belong to a single attribute set (the query accommodates only a single, global attribute set id). In version 3, attributes may be combined from different attribute sets, within a single query, even within a single operand (an attribute set id may accompany every attribute). This is a significant enhancement, providing support for multiple database searching, and allowing attribute sets to be defined with less replication.

Also in version 3, new data types for terms are defined (in version 2 only binary values are allowed).

1.4 Version 3 Assumption

There are several enhancements in version 3 pertaining to attribute sets and query construction; the two enhancements described at the end of 1.3 are certainly the most important, and are seen to be functional prerequisites for the development of an attribute architecture. For this reason, version 3 is assumed by this architecture, and version 2 is not addressed.

1.5 Limitations

The Z39.50 type-1 query has known limitations, and the architecture specified in this document is restricted by these limitations. As the standard evolves and new versions are approved, the architecture may be expanded.

1.5.1 Semantic Indicator

In order to compensate for some of the type-1 limitations, it may be necessary to utilize the semantic indicator (provided within version 3) for purposes that would otherwise be accomplished by more coherent mechanisms if these limitations were not present. It should be thus noted that in future versions of Z39.50 it is intended that these limitations will be addressed, obviating the need for extensive use of the semantic indicator at the attribute level.

1.5.2 Nesting and Occurrence

Following text removed:
When specifying, for example, "field-1 within field-2" it will be possible to specify an occurrence, for example, "second occurrence of field-1, within field-2" but not, for example, "second occurrence of field-1, within second occurrence of field-2". That is, only a single node may carry occurrence, likely either the root or the leaf, and this should be statically specified within the attribute set definition. (This is a limitation of class 1, not a limitation of the type-1 query, and may be overcome by classes other than 1).
Replaced by:
Occurrence is not permitted in conjunction with nesting. Thus for example "field 1 within field 2" (nesting), or "second occurrence of field 3" (occurrence) may be specified but not "second occurrence of field 1, within field 2". This is a limitation posed by the type-1 query. In the future (either as an amendment to the type-1 query, or as part of a new query) nesting should be cast as an operator; thus in "A within B", 'within' would be analogous to 'and' in "A and B".

Following text deleted as these are now resolved.

1.6 Some Unresolved Issues

  1. Though there is clear consensus within the ZIG that nesting of Field Name attributes should be supported, there is division of opinion on whether nesting should be permitted for use Abstract attributes.
  2. Similarly, there is division of opinion on whether specification of occurrence should be permitted for use Abstract attributes.
  3. some feel that the limitation described in 1.5.2, for occurrence, can be overcome, for example by use of the 'complex' form of attributeValue.
  4. It is not clear whether anchoring is sufficiently specified. See 3.2.1.2.
  5. 3.1.2 states that a class 1 attribute set may not define any attribute types not defined for class 1 (i.e. not defined in this document). Some feel this is overly restrictive.

2. Attribute Set Class definitions

An Attribute set class definition provides an umbrella context for the definition of an attribute set belonging to a particular class. A class definition defines all of the attribute types that may be included in an attribute set for that class.

Following added:

This architecture strongly recommends that an attribute set definition conforming to a particular class not include types that are not defined for that class.
Ed. Note: It seems that this question is still un-resolved, that is, how strongly-worded a position should the architecture document take on this point?

(At least one attribute set class definition will be developed, but it is not clear that more than one will be necessary.)

2.1 Mutual Exclusivity

This section move here, from 3.1.4 of previous draft as this seems more appropriate at the architecture level rather than the class level.

An attribute class may declare that specific attribute types are mutually exclusive within a query operand (for example, Abstract and Field Name attributes of Class 1). Mutual exclusivity rules are to be defined at the level of the attribute class rather than specific attribute sets.

2.2 Attribute Values

This section move here, from 3.4 of previous draft

Although many attribute values are (and perhaps will continue to be) enumerated, an attribute value may take any of the following forms:

3. Attribute Set Class 1

This class is intended to cover all known, existing needs

The following needs re-examination:

(existing attribute sets [will] [may] need to be re-specified within this framework).

The following sentence (unchanged from previous draft) is inconsistent with the recommendation in section 2. This needs resolution.

The intent is not to preclude new types of attributes beyond those specified here; it should be possible to add new attribute types to this broad attribute class, if they are relatively orthogonal to the attribute types defined here.
Note: There may be other approaches developed which partition the set of attributes into fundamentally different types; this might result in the definition of a new attribute class that is inconsistent with this class. However, no need for such a separate class has been identified.

The importance of enumerating all of the possible attribute types within this "universal" attribute class is to provide a template for developers of attribute sets, and to set up a framework for interoperability among independently defined attribute sets which are intended to serve various communities. In particular, it should be possible for groups of content experts to develop new use Abstract attribute sets, ASN.1 datatypes, comparison operators, and perhaps structure/format attributes which fit comfortably within this framework. Server developers can, based on the template defined here, recognize various attribute types that are omitted in a given query, as well as illegal repetitions or combinations of attributes of given types that would indicate a malformed query.

3.1 General Rules for Class 1

3.1.1 Semantic Precedence and Interaction among Sets

The context of this attribute class is identified as being in effect for a query, when the OID of an attribute set conformant with this class is specified as the global OID for a Z39.50 query.

Following parenthetical note deleted:

(most likely one of the utility attribute sets which it is proposed below that the ZIG develop).
The "global" OID refers to the object identifier within the type-1 query that does not accompany a specific attribute. For class 1, this is referred to as the dominant OID for the query. When attributes from different attribute sets are mixed within a query, and when the respective attribute set definitions conflict such that the resulting semantics are ambiguous, the semantics of the dominant set prevail.

Any attribute set definition should:

Interaction between attribute sets conformant to this attribute set class and historical attribute sets not conformant to this class within a query operand are undefined.

3.1.2 Inheritance and Population

An attribute set consistent with this attribute class will define attributes of one or more of the types specified in 3.2.

Any class 1 attribute set inherits the rules, prescribed for the class, that apply to attribute types defined for that set. However, a class 1 attribute set need not define nor populate every attribute type defined for class 1. A class 1 attribute set may define as few as one attribute type, or as many as all of the attribute types defined for class 1.

Following sentence deleted (pending resolution of this issue):

It may not define attribute types not defined for class 1.

3.1.3 Omitted Attributes

Following text deleted:
If attributes of a given type are omitted in a query they should be treated as omitted in establishing the semantics of a given query (in other words, there are no defaults for omitted attributes).
And is replaced by:
An attribute set definition should not specify a default value for an attribute type to be applied when that attribute type is omitted from an operand. Each individual server may determine the semantics of omitted attributes. Thus when a client omits an attribute of a given type from an operand (unless that type is not applicable for the given attribute combination, or unless the attribute type is mandatory) the client is, in effect, leaving it to the server to select a value.
Following deleted, and re-written as 2.1:

3.1.4 Mutual Exclusivity

Some types of attributes (for example, use and field attributes) are mutually exclusive in a given query operand; these rules are defined at the level of the attribute class rather than specific attribute sets.

3.1.5 Repeatability

In general if any attribute is allowed to be repeatable, the semantics of repeating the attribute must be well-defined (implicitly or explicitly).

When an attribute set definition is being developed and the need is foreseen for an attribute to repeat, for example when values are orthogonal, it is recommended that the developers consider separating the values into different attribute types, if possible.

Ed. question: what are the implications of the above recommendation in relation to the recommendation that new attribute types not be invented?

While repeatability may be permissible for a given attribute type, as a general principle, an attribute type should not be repeated as a substitute for Boolean operations. To amplify this point, an attribute definition might prescribe how to interpret, for example, multiple Use Abstract attributes in a single operand. For example, the definition might prescribe:

The definition may include a semantic indicator, allowing a client to select among several semantic alternatives. However, none of those alternatives should be to construct separate operands (linked by boolean 'and' or 'or') for each use Abstract attribute -- the type-1 query supports boolean operations, so allowing another means of specifying boolean operations would add un-necessary complexity (in contrast to potential semantic interpretations of multiple use Abstract attributes which cannot be otherwise represented via the type-1 query, as in the examples above).

3.1.5.1 Mechanism for Repeating Attributes

There are two mechanisms for providing multiple attributes of the same type within an operand:
  1. Via 'list' within 'complex' CHOICE of 'attributeValue' within AttributeElement.
  2. Via separate instances of AttributeElement.
The first mechanism (provided by version 3, and not supported in version 2) is the mechanism prescribed for this class.

3.2 Attribute Types Defined within the Attribute Class

3.2.1 Use Access Point Attribute Types

This attribute class definition recognizes that some applications of Z39.50 make a strong link to database schemes, while others continue to work with abstract definitions of databases. Thus there are two distinct attribute types to accommodate these very different approaches to the use of Z39.50. These two types should not be mixed within an operand. Following deleted:

3.2.1.1 Nesting of Use Attributes

Multiple Use attributes may occur within an operand, and this may imply nesting (either implicitly or via semantic indicator; multiple occurrences may have a different interpretation, also either implicitly or via a semantic indicator).

The order of nesting should be as in the following example: if field-1, field-2, and field-3 are supplied, in that order, it means field-3 within field-2 within field-1. This rule, though arbitrary and perhaps beyond the scope of architecture, is supplied in order to avoid conflicting definitions, and reduce complexity of implementations supporting multiple attribute sets where nesting is prescribed.

3.2.1.2 Anchored vs. Non-anchored Searching

Whether a search is flat or nested (structured), it should be either implicit clear, or there should be an explicit indicator (see 3.2.8) designating whether the access point path is anchored or unanchored.

Definitions:

example:
Suppose a schema has elements Name (unstructured) and Creator, structured into sub-elements Name, eMail, and Affiliation: When Field Name attribute Name is specified, as anchored, then it is intended to match the first-level Name; if multiple-Field Name- attributes Creator and Name are specified, as anchored, then it is intended to match Name within Creator. If the single Field Name attribute Name is specified, as unanchored, then it is intended to match either Name or Name within Creator.

A single wildcard, for example, "Any", may be used in a Field Name path.

And is replaced by:

3.2.1.1 Nesting, Occurrence, and Anchoring of Access Point Attributes

See definitions below.
  1. Nesting of Abstract attributes (for example: Place Name within Subject Heading) is not permitted.
  2. Nesting of Field Name attributes may be supported, and if so, nesting should be indicated by repetition of the Field Name attribute type, where the order of nesting is as in the following example: field 1, field 2, and field 3, supplied in that order, means "field 3 within field 2 within field 1". (This rule is supplied in order to avoid conflicting definitions, and reduce complexity of implementations supporting multiple attribute sets where nesting is prescribed.)
  3. Occurrence of Abstract attributes (for example: second Author) [may/ may not] be supported.
    Ed. Note: Not sure how we ended on this. The opinion was expressed that an abstract access point should not have "occurrences". However there is the example "second author"; on the other hand is the suggestion that, for this example, there should be two access points, "first author" and "second author", or, another argument is that if author has occurrences then it should be cast as a Field Name rather than an Abstract access point.
  4. Occurrence of a Field Name attribute may be supported, but not in conjunction with nesting. So, for example, "second occurrence of field N" may be supported, but not "second occurrence of field M within field N".
  5. When an Abstract attribute is supplied, it is considered anchored.
  6. When one or more Field Name attributes are supplied, these may be indicated as not anchored by defining a wildcard attribute as a value of the attribute type Field Name. In the absence of a wildcard attribute, they are considered anchored.
Definitions:
example of Anchored vs. Not anchored:
Suppose a schema includes elements Description (unstructured) and Contact, where Contact is structured into sub-elements Name, eMail, and Affiliation: When Field Name attribute Description is specified as anchored, then it is intended to match the first level Description; if multiple Field Name attributes Contact and Description are specified, as anchored, then it is intended to match Description within Contact. If the single Field Name attribute Description is specified as not anchored, then it is intended to match either Description, or Description within Contact.

3.2.1.2 Mixing Field Name Attributes from Multiple Attribute Sets

Mixing Field Name attributes from multiple attribute sets is permissible, and no attribute set conforming to this class should preclude mixing of its Field Name attributes with Field Name attributes from other sets.

Following added:

By "mixing", we mean for example, in the query
'field 1 within field 2' AND 'field 3 within field 4'
field 1 and field 2 may be defined in one attribute set while field 3 and field 4 are defined in a different set; it is not intended that field 1 and field 2 may be in different sets (or field 3 and field 4); mixing is permitted across operands, not within an operand.
This is a cross-attribute-set rule for any attribute set conforming to class 1.

3.2.2 Query Management Attribute Types

These attributes have the property that they can be rewritten by the server as part of a revised query that the server returns to the client.

3.2.3 Qualifying Attribute Types

3.2.4 Comparison Attribute Type

Defines the relationship between the term in the operand and the term in the term list at the server.

Ed. note: above definition added this draft, as there was no definition in previous draft. Is this definition suitable?

Comparison attributes are strongly typed. There are different comparison attributes for each of the term-value datatypes discussed in 3.3 (numerics, character strings, and language strings).

Comparison attributes are mandatory, non-repeatable and numeric valued.

Comparison attributes are somewhat similar to a generalization of the relation attributes of bib-1, but named differently to avoid confusion. Note that equality is used only for cases of true equality testing (e.g. to test that two numbers are mathematically equal, or that two character strings are lexically equivalent; however equality would not, in general, be used for language strings). Various "matching" comparison operators are used for string matching using various kinds of regular expressions, for example. Sample values might include:

The bib-1 Completeness attribute and most of the Truncation attribute have been folded into the Comparison attribute as forms of anchored matching.

3.2.5 Format/Structure Attribute Type

Used primarily to help with the interpretation of a character string term in cases where the comparison operator normally does not assume an ASN.1 datatype; it provides guidance for the datatype conversion process. This is an enumerated or string-valued attribute. Non-repeatable.

3.2.6 Occurrence Attribute Type

Indicates the desired occurrence of a field as specified by a Field Name attribute ([may/may not] not be used with an Abstract attribute). For example "second occurrence of field 1".

3.2.7 Indirection Attribute Type

Indicates that the actual content of the term is not supplied, but instead, a pointer (e.g. url) is supplied in lieu of the actual term. This attribute has enumerated values, e.g. URL, URN, DOI, etc. Non-repeatable.

Following is deleted:

3.2.8 Anchor Attribute Type

indicates whether a search is anchored or unanchored (see 3.2.1.2), that is, whether matching is to occur at the root of the element tree, or may begin at any node of the element tree.

3.3 Datatyping

It is recommended that term values have strong datatyping, carrying over into the definition of the comparison attributes (operators); for example, there should be separate comparison attributes for strings, numerics, etc. Groups defining specific use Abstract attributes should consider defining ASN.1 datatypes to support their applications -- for example, personal names or dates, or geospatial information (points and polygons). There will of course be cases where the ASN.1 approach to datatyping will be too heavy-weight; in those cases the Format/Structure attribute type can be used in conjunction with strings to indicate that the content of a string represents some data in a specific format.

The basic datatypes defined as part of the general attribute class should include:

3.3.1 Additional Types

Attribute set developers may define additional ASN.1 types, for example, for dates, points and polygons.

There is a Z39.50 ASN.1 Date/time definition, that may be specified when the term is a date and/or time.

Personal names are an interesting "boundary" case where one might argue either for an ASN.1 based definition or a Format/Structure attribute indicating a normalized name according to some rules; the choice of the appropriate approach is best left to a bibliographic attribute definition working group.

Ed. note: Section 3.4 moved up, to a new section, 2.2.

3.5 Enumeration and Summary of Class 1 Attribute Types

An attribute set definition conformant to class 1 should follow the guidelines and use the numeric values in the summary table below to represent the class 1 types. If any of these types is omitted in an attribute set definition, the definition should skip the value for that type rather than renumber.
Attribute TypeNumberValue RepeatableOccurrence
Abstract1enumerated noAbstract or Field Name must occur and are mutually exclusive within an operand
Field Name2generally, string yesditto
Weight3numeric:e.g. 0 to 1000 nooptional
Hit Count4numeric nooptional
Stopwording50 or 1 nooptional
Language6string or enumerated generally, nooptional
Content Authority7string generally, nooptional
Expansion/interpretation8string or enumerated yesoptional
Comparison9enumerated nomandatory
Format/Structure10string or enumerated nooptional
Occurrence11numeric nooptional
Indirection12enumerated nooptional

Following deleted:

4. Follow-on Actions

The ZIG should define at least two attribute sets within the new attribute set architecture (perhaps more than two; this is a packaging and granularity question). The ZIG should move away from naming conventions such as "bib-1" which imply some special legitimacy or precedence hierarchy for various attribute sets, and not use names for groups of attribute sets like "CORE". This may help avoid political debates.

One of the attribute sets (to be defined by the ZIG) within this attribute class should cover widely used basic functions, including comparison operator values, language codes, and basic expansion/interpretation values, plus query management types -- call this attribute set, for a working name, "PURPLE".

In addition, the ZIG should define a basic set of use attributes, called, for a working name, "ORANGE". in addition, a committee of bibliographic experts should be established, under auspices such as NISO, to define a new bibliographic attribute set within this general framework.

4. Lessons Learned: Recommendations for Future Enhancements to the Z39.50 Query

As a result of the deliberations over this architecture, limitations posed by the type-1 query have resulted in identification of recommended enhancements that should be considered for a future version of Z39.50. These are documented here (additional contributions to this list are welcome):


Library of Congress
Comments