Two problems surfaced about this time. There had been no publicly available written definitions of the semantics of the bib-1 attributes, so implementers were unclear about how to support them. Since this was the only visible set, and since version 2 did not allow a query to use more than a single attribute set, implementers wanted to add whatever attributes they needed to the bib-1 set. Thus the bib-1 set evolved away from the bibliographic community where it had started, and it grew without plan or rigor until it was viewed as a kind of global set. A document detailing the semantics was updated in 1995, but it is not part of the standard.
In 1994 and 1995, as Z39.50 version 3 was being finalized and as Z39.50 began to be widely implemented, the Z39.50 Implementors Group faced questions about the relationships among attribute sets that other groups were developing, notably the STAS and GILS attribute sets. Clifford Lynch prepared a discussion paper for the February 1996 ZIG meeting that detailed the issues:
The major conclusion of the group was that a new architecture for attribute sets should be developed; they went on to recommend an architecture based on classes of attribute sets, with expanded attribute types. Another major conclusion was that expert communities, rather than the ZIG, should be responsible for developing and maintaining attribute sets (as was the case with GILS and STAS). Notably, they recommended that the bibliographic community, rather than the ZIG, develop the next generation of bibliographic attributes. The ZIG should continue to be responsible for attributes that are general to Z39.50, that is, not specific to a given community.
Ed. Note: List of meeting participants and other significant contributors to be listed here.
The type-1 query consists of one or more search terms, each with a set of attributes, specifying, for example, the type of term (author, title, subject, etc.), whether the term is truncated, and its structure. The server is responsible for mapping attributes to the logical design of the database.
A term in a type-1 query, together with its accompanying collection of attributes, is referred to as an operand. Operands may be combined in a type-1 query, linked by boolean operators (And, Or, And-not, and Proximity).
Each attribute is a pair: an attribute type and a value of that type. An Attribute set defines a set of attribute types, and for each, a list of possible values.
An attribute set definition is assigned an object identifier, referred to as its attribute set identifier.
Example: The bib-1 attribute set defines a number of attribute types; one of which is Use. For bib-1 Use attributes, many attribute values are defined, one of which is personal name. Each type is assigned a numeric value, and each value is assigned either a numeric value or a string. In bib-1, type Use is assigned the value 1, and Personal Name is assigned the value 1. Thus bib-1 Use attribute Personal Name is represented as the pair (1,1). This pair is further qualified by the bib-1 attribute set identifier (1.2.840.10003.3.1) to distinguish it from the pair (1,1) that may be defined by other attribute sets.
In version 2 of Z39.50, all attributes within a query must belong to a single attribute set (the query accommodates only a single, global attribute set id). In version 3, attributes may be combined from different attribute sets, within a single query, even within a single operand (an attribute set id may accompany every attribute). This is a significant enhancement, providing support for multiple database searching, and allowing attribute sets to be defined with less replication.
Also in version 3, new data types for terms are defined (in version 2 only binary values are allowed).
Replaced by:When specifying, for example, "field-1 within field-2" it will be possible to specify an occurrence, for example, "second occurrence of field-1, within field-2" but not, for example, "second occurrence of field-1, within second occurrence of field-2". That is, only a single node may carry occurrence, likely either the root or the leaf, and this should be statically specified within the attribute set definition. (This is a limitation of class 1, not a limitation of the type-1 query, and may be overcome by classes other than 1).
Occurrence is not permitted in conjunction with nesting. Thus for example "field 1 within field 2" (nesting), or "second occurrence of field 3" (occurrence) may be specified but not "second occurrence of field 1, within field 2". This is a limitation posed by the type-1 query. In the future (either as an amendment to the type-1 query, or as part of a new query) nesting should be cast as an operator; thus in "A within B", 'within' would be analogous to 'and' in "A and B".
Following text deleted as these are now resolved.
1.6 Some Unresolved Issues
- Though there is clear consensus within the ZIG that nesting of Field Name attributes should be supported, there is division of opinion on whether nesting should be permitted for
useAbstract attributes.- Similarly, there is division of opinion on whether specification of occurrence should be permitted for
useAbstract attributes.- some feel that the limitation described in 1.5.2, for occurrence, can be overcome, for example by use of the 'complex' form of attributeValue.
- It is not clear whether anchoring is sufficiently specified. See 3.2.1.2.
- 3.1.2 states that a class 1 attribute set may not define any attribute types not defined for class 1 (i.e. not defined in this document). Some feel this is overly restrictive.
Following added:
This architecture strongly recommends that an attribute set definition conforming to a particular class not include types that are not defined for that class.Ed. Note: It seems that this question is still un-resolved, that is, how strongly-worded a position should the architecture document take on this point?
(At least one attribute set class definition will be developed, but it is not clear that more than one will be necessary.)
An attribute class may declare that specific attribute types are mutually exclusive within a query operand (for example, Abstract and Field Name attributes of Class 1). Mutual exclusivity rules are to be defined at the level of the attribute class rather than specific attribute sets.
Although many attribute values are (and perhaps will continue to be) enumerated, an attribute value may take any of the following forms:
Ed. above example added this draft.
The following needs re-examination:
(existing attribute sets [will] [may] need to be re-specified within this framework).
The following sentence (unchanged from previous draft) is inconsistent with the recommendation in section 2. This needs resolution.
The intent is not to preclude new types of attributes beyond those specified here; it should be possible to add new attribute types to this broad attribute class, if they are relatively orthogonal to the attribute types defined here.Note: There may be other approaches developed which partition the set of attributes into fundamentally different types; this might result in the definition of a new attribute class that is inconsistent with this class. However, no need for such a separate class has been identified.
The importance of enumerating all of the possible attribute types within
this "universal" attribute class is to provide a template for developers
of attribute sets, and to set up a framework for interoperability among
independently defined attribute sets which are intended to serve various
communities. In particular, it should be possible for groups of content
experts to develop new use Abstract attribute sets, ASN.1
datatypes, comparison
operators, and perhaps structure/format attributes which fit comfortably
within this framework. Server developers can, based on the template
defined here, recognize various attribute types that are omitted in a
given query, as well as illegal repetitions or combinations of attributes
of given types that would indicate a malformed query.
Following parenthetical note deleted:
The "global" OID refers to the object identifier within the type-1 query that does not accompany a specific attribute. For class 1, this is referred to as the dominant OID for the query. When attributes from different attribute sets are mixed within a query, and when the respective attribute set definitions conflict such that the resulting semantics are ambiguous, the semantics of the dominant set prevail.(most likely one of the utility attribute sets which it is proposed below that the ZIG develop).
Any attribute set definition should:
Any class 1 attribute set inherits the rules, prescribed for the class, that apply to attribute types defined for that set. However, a class 1 attribute set need not define nor populate every attribute type defined for class 1. A class 1 attribute set may define as few as one attribute type, or as many as all of the attribute types defined for class 1.
Following sentence deleted (pending resolution of this issue):
It may not define attribute types not defined for class 1.
And is replaced by:If attributes of a given type are omitted in a query they should be treated as omitted in establishing the semantics of a given query (in other words, there are no defaults for omitted attributes).
An attribute set definition should not specify a default value for an attribute type to be applied when that attribute type is omitted from an operand. Each individual server may determine the semantics of omitted attributes. Thus when a client omits an attribute of a given type from an operand (unless that type is not applicable for the given attribute combination, or unless the attribute type is mandatory) the client is, in effect, leaving it to the server to select a value.Following deleted, and re-written as 2.1:
3.1.4 Mutual Exclusivity
Some types of attributes (for example, use and field attributes) are mutually exclusive in a given query operand; these rules are defined at the level of the attribute class rather than specific attribute sets.
When an attribute set definition is being developed and the need is foreseen for an attribute to repeat, for example when values are orthogonal, it is recommended that the developers consider separating the values into different attribute types, if possible.
Ed. question: what are the implications of the above recommendation in relation to the recommendation that new attribute types not be invented?
While repeatability may be permissible for a given attribute type, as a
general principle, an attribute type should not be repeated as a
substitute for Boolean operations. To amplify this point, an attribute
definition might prescribe how to interpret, for example, multiple Use
Abstract attributes in a single operand. For example, the definition might
prescribe:
Following deleted:
or- multiple Use attributes implies nesting, thus if Use attributes use-1, use-2, and use-3 are specified in a single operand, in that order, it means search for use-3 within use-2 within use-1 (see "Nesting of Use-type Attributes" 3.2.1.1).
The order of nesting should be as in the following example: if field-1, field-2, and field-3 are supplied, in that order, it means field-3 within field-2 within field-1. This rule, though arbitrary and perhaps beyond the scope of architecture, is supplied in order to avoid conflicting definitions, and reduce complexity of implementations supporting multiple attribute sets where nesting is prescribed.
Definitions:
example:
Suppose a schema has elements Name (unstructured) and Creator, structured into sub-elements Name, eMail, and Affiliation:When Field Name attribute Name is specified, as anchored, then it is intended to match the first-level Name; if multiple-Field Name- attributes Creator and Name are specified, as anchored, then it is intended to match Name within Creator. If the single Field Name attribute Name is specified, as unanchored, then it is intended to match either Name or Name within Creator.
- Name
- Creator
- Name
- Affiliation
A single wildcard, for example, "Any", may be used in a Field Name path.
And is replaced by:
3.2.1.1 Nesting, Occurrence, and Anchoring of Access Point Attributes
See definitions below.Definitions:
- Nesting of Abstract attributes (for example: Place Name within Subject Heading) is not permitted.
- Nesting of Field Name attributes may be supported, and if so, nesting should be indicated by repetition of the Field Name attribute type, where the order of nesting is as in the following example: field 1, field 2, and field 3, supplied in that order, means "field 3 within field 2 within field 1". (This rule is supplied in order to avoid conflicting definitions, and reduce complexity of implementations supporting multiple attribute sets where nesting is prescribed.)
- Occurrence of Abstract attributes (for example: second Author) [may/ may not] be supported.
Ed. Note: Not sure how we ended on this. The opinion was expressed that an abstract access point should not have "occurrences". However there is the example "second author"; on the other hand is the suggestion that, for this example, there should be two access points, "first author" and "second author", or, another argument is that if author has occurrences then it should be cast as a Field Name rather than an Abstract access point.- Occurrence of a Field Name attribute may be supported, but not in conjunction with nesting. So, for example, "second occurrence of field N" may be supported, but not "second occurrence of field M within field N".
- When an Abstract attribute is supplied, it is considered anchored.
- When one or more Field Name attributes are supplied, these may be indicated as not anchored by defining a wildcard attribute as a value of the attribute type Field Name. In the absence of a wildcard attribute, they are considered anchored.
- Nesting is the ability to specify that the field that contains the term must be within another specified field.
- Occurrence is the ability to specify the occurrence of the field that contains the term.
- Anchored means that matching must occur from the root of the element tree.
- Not anchored means that matching may occur beginning at any node within the element tree.
example of Anchored vs. Not anchored:
Suppose a schema includes elements Description (unstructured) and Contact, where Contact is structured into sub-elements Name, eMail, and Affiliation:When Field Name attribute Description is specified as anchored, then it is intended to match the first level Description; if multiple Field Name attributes Contact and Description are specified, as anchored, then it is intended to match Description within Contact. If the single Field Name attribute Description is specified as not anchored, then it is intended to match either Description, or Description within Contact.
- Description
- Contact
- Name
- Description
Following added:
By "mixing", we mean for example, in the queryThis is a cross-attribute-set rule for any attribute set conforming to class 1.'field 1 within field 2' AND 'field 3 within field 4'field 1 and field 2 may be defined in one attribute set while field 3 and field 4 are defined in a different set; it is not intended that field 1 and field 2 may be in different sets (or field 3 and field 4); mixing is permitted across operands, not within an operand.
Note: A Character set Attribute is not proposed; the current thinking is that it is unnecessary since it is handled by general Z39.50 character set support.See Character Set and Language NegotiationEd. Note: We need to reconsider this.
Ed. note: above definition added this draft, as there was no definition in previous draft. Is this definition suitable?
Comparison attributes are strongly typed. There are different comparison attributes for each of the term-value datatypes discussed in 3.3 (numerics, character strings, and language strings).
Comparison attributes are mandatory, non-repeatable and numeric valued.
Comparison attributes are somewhat similar to a generalization
of the relation attributes of bib-1, but named differently to avoid confusion. Note that
equality is
used only for cases of true equality testing (e.g. to test that two numbers are mathematically equal, or that two character strings are lexically equivalent; however equality would not, in general, be used for language strings).
Various "matching" comparison operators are used for string matching using
various kinds of regular expressions, for example. Sample values might
include:
Following is deleted:
3.2.8 Anchor Attribute Type
indicates whether a search is anchored or unanchored (see 3.2.1.2), that is, whether matching is to occur at the root of the element tree, or may begin at any node of the element tree.
The basic datatypes defined as part of the general attribute class should include:
There is a Z39.50 ASN.1 Date/time definition, that may be specified when the term is a date and/or time.
Personal names are an interesting "boundary" case where one might argue either for an ASN.1 based definition or a Format/Structure attribute indicating a normalized name according to some rules; the choice of the appropriate approach is best left to a bibliographic attribute definition working group.
Ed. note: Section 3.4 moved up, to a new section, 2.2.
| Attribute Type | Number | Value | Repeatable | Occurrence |
|---|---|---|---|---|
| Abstract | 1 | enumerated | no | Abstract or Field Name must occur and are mutually exclusive within an operand |
| Field Name | 2 | generally, string | yes | ditto |
| Weight | 3 | numeric:e.g. 0 to 1000 | no | optional |
| Hit Count | 4 | numeric | no | optional |
| Stopwording | 5 | 0 or 1 | no | optional |
| Language | 6 | string or enumerated | generally, no | optional |
| Content Authority | 7 | string | generally, no | optional |
| Expansion/interpretation | 8 | string or enumerated | yes | optional |
| Comparison | 9 | enumerated | no | mandatory |
| Format/Structure | 10 | string or enumerated | no | optional |
| Occurrence | 11 | numeric | no | optional |
| Indirection | 12 | enumerated | no | optional |
Following deleted:
4. Follow-on Actions
The ZIG should define at least two attribute sets within the new attribute
set architecture (perhaps more than two; this is a packaging and
granularity question). The ZIG should move away from naming conventions
such as "bib-1" which imply some special legitimacy or precedence
hierarchy for various attribute sets, and not use names for groups of
attribute sets like "CORE". This may help avoid political debates.
One of the attribute sets (to be defined by the ZIG) within this attribute class should cover widely used basic functions, including comparison operator values, language codes, and basic expansion/interpretation values, plus query management types -- call this attribute set, for a working name, "PURPLE".
In addition, the ZIG should define a basic set of use attributes, called, for a working name, "ORANGE". in addition, a committee of bibliographic experts should be established, under auspices such as NISO, to define a new bibliographic attribute set within this general framework.
Library
of Congress