Z39.50
Utility
Attribute Set
Draft 3
July 1999
Status: The initial version of the Utility set will be defined following the August ZIG meeting. Changes resulting from discussion at the meeting will be applied, numbers will be assigned to the attribute values, and an object identifier will be assigned. This draft, draft 3, is essentially unchanged from a preliminary version of draft 3 issued May 21 (with comments solicited by July 1). There were no comments.
The Utility attribute set defines values for the attribute types defined for Class 1, as specified by the Z39.50 Attribute Architecture .
Both the Utility set and the Cross Domain set define values that are independent of any particular domain or community -- values that would otherwise need to be defined in several attribute sets.
The Utility set defines commonly used values for the Class 1 types, as well as metatdata access points for records, as distinguished from metatdata access points for resources; the latter is the province of the Cross Domain set.
This distinction between record and resource is illustrated by the example of a MARC record that describes a document. The MARC record is the "record" and the document is the "resource". This does not mean that the attribute architecture (or that Class 1) models record and resource as always distinct. When the record is the resource (e.g. in a document database) the metadata access points are the province of the Cross Domain set.
For example, 'Language' is an access point in both sets. Utility set Access Point attribute 'Language' refers to the language of the database record, while Cross Domain set Access Point attribute 'Language' refers to the value of the language field.
A MARC record, created in English, might describe a French book. The Utility Access Point attribute Language would refer to the language of the MARC record, while the Cross Domain Access Point attribute Language would refer to the language of the book (English and French, respectively).
Following is a list of the Class 1 attribute types, and for each type, the values defined by the Utility set.
- Access Point
- Record Date
A date associated with the record. May be used in conjunction with a semantic qualifier, such as: "date/time created", "date/time last modified", or "date/time last reviewed", or it may be used unqualified, in which case it it a date/time associated with the record, assigned by the server.
- Record Creator
A name, identifying the creator of the database record. May be qualified in the same manner as the Access Point value 'Name' in the Cross Domain set.
- Language of Record
Based on RFC 1766
- Local Control Number of Record
A string (integer or character string) assigned by the server that uniquely identifies a record in the database.
- Cost to Retrieve Record
The cost that the client will incur if it subsequently retrieves the full record. This access point may be used for example to limit a search to records where the cost of retrieval is less than a specified amount. (When the server does not charge for retrieval a cost of zero may be assumed.)
- Record Format
The format of the database record. This access point may be used to limit the results to include records of a specific format.
- Score
See Rank.
- Rank
Access points Score and Rank pertain to relevancy ranking for ranked retrieval. The server may assign a score and/or a rank to a result set record. The score or rank applies to the record relative to other records in the result set. A score is an integer between zero and 100. The rank of a record is an integer from 1 to N, where N is the result set size. (Note that a higher score means more relevant, while a lower rank means more relevant.) It is assumed that if record A has a better (higher) score than record B, then it will also have a better (lower) rank, however score and rank differ in the following respect: no two records in the result set have the same rank, and for every integer between 1 and N (where N is the result set size) there is exactly on record with that rank; on the other hand, more than one result set record may share the same score, and there need not be a record for every possible score. The reason for defining both attributes (Score and Rank) is that some ranked retrieval systems score records while others rank records. These two attributes may be used to restrict a result to some threshold value, for example "records with a score greater than 50" or "records with a rank less than 10".
- Result Set Position
This attribute may be used to limit the size of the result set. For example the operand "Result Set Position .less than or equal. 10" could be included in the query to restrict the result set to 10 or less records. Note that the Rank access point can be used for the same purpose; this attribute is defined because a server might support this function though it may not support ranking. The rank of a record is not necessarily its result set position; furthermore, the result set position of a record might change (if the result set is sorted) but its rank should not change.
- All Access Points
When the origin uses 'All Access Points' it is asking the server
to search for the term via all supported access points.
- Anywhere in Record
When the origin uses 'Anywhere in Record' it is asking the server
to search for the term anywhere it may occur within the record, subject to the server's search capability and interpretation of what "anywhere within the record" means. The server might search commonly used access points, or it might search the entire content of the record.
- Server Choice
When the origin uses 'Server-choice' it is asking the server
to select an access point (which may be defined in any attribute set,
not necessarily the Utility set), and to use its best
judgment in making that selection.
- Wildcard
May be used for a single wild card. Thus:
"field-1 within wildcard within field-3" would match "field-1 within field-2 within field-3" and it would match "field-1 within field-4 within field-3" but would not match "field-1 within field-2 within field-4 within field-3".
- Wildpath
May be used for an unanchored search. Thus:
"field-1 within wildpath" would match "field-1", or "field-1 within field-2", or "field-1 within field-2 within field-3", etc.
- Semantic Qualifier
- Null
The Null value of the Semantic Qualifier attribute may be supplied within a query operand, when that operand also includes one or more additional Semantic Qualifier attribute values (defined in this or another attribute set).
The server pairs each supplied value of the Semantic Qualifier attribute with the Access Point attribute to try to find a best match. (The server determines what match is best.)
By including the value Null, the client indicates that the server may, if it chooses, ignore the other semantic qualifiers (thus allowing the server to match the Access Point attribute value with Null, in effect rendering the Access Point unqualified).
Whenever one or more Semantic Qualifiers are included in an operand, the server must either select one or fail the search. Thus the Null value is defined in order to allow the server to choose not to select any of the real Semantic Qualifiers (i.e. those other than Null). The value Null should not be supplied unless one or more other Semantic Qualifier values are also supplied.
- Language
The Language attribute indicates the language of the supplied term. It is a character string based on RFC 1766 .
- Content Authority
- Null
The Null value of the Content Authority attribute is used in a manner analogous to the Null value of the Semantic Qualifier attribute. It may be supplied within a query operand, when that operand also includes one or more other Content Authority attribute values.
The server pairs each supplied value of the Content Authority attribute with the Access Point attribute to try to find a best match. (The server determines what match is best.)
By including the value Null, the client indicates that the server may, if it chooses, ignore the other Content Authority values (thus allowing the server to match the Access Point attribute value with Null, in effect rendering the Access Point unqualified).
Whenever one or more Content Authority values are included in an operand, the server must either select one or fail the search. Thus the Null value is defined in order to allow the server to choose not to select any of the real Content Authority values (i.e. those other than Null).
The Utility set defines only the value Null. Real Content Authority values are to be defined in other attribute sets. The value Null should not be supplied unless one or more other Content Authority values are also supplied.
- Expansion/Interpretation
- Left Truncation
- Right Truncation
- Word-by-word Left Truncation
- Word-by-word Right Truncation
- Phonetic
- Stem
- Thesaural Expansion
- Singular Matching
- Plural Matching
- Case Sensitive
- Case Insensitive
- Punctuation Sensitive
- No Stopwords
Client requests the server not to treat any word in the term as a stopword.
- Search Words Stopped
Meaningful only in a response query. Used by the server to indicate that it
treated one or more words in the term as stopwords. (May be supplied in a submitted query but should be ignored by the server. The server should not infer any semantics based on the occurrence of this attribute in a submitted query, however, nor should the server treat its occurrence as an error, because the client may have simply resubmitted a reformulated query or query otherwise previously returned by the server, where the server included this attribute.)
- Normalized Weight
The weight assigned to the term, for purposes of ranking or assigning scores to records. An integer from 0 to 1000. May be attached to a term in a request query (submitted by the client in a search request) in which case the supplied value indicates the weight that the client requests be attached to the term. May be attached to a term in a response query (returned by the server in a search response) in which case the supplied value indicates the weight that the server attached to the term, or the weight that the server recommends for use in a re-submitted query. When supplied by the server the value may be the same as, or different from, the value in the submitted query. May be supplied by the server even if it was not supplied in the request query.
- Hit Count
May be attached to a term in a returned query in the Search response, and its value is the number of records in which the term occurs. Meaningful only in a returned query, although it may occur in a submitted query but should be ignored by the server. (The server should not infer any semantics based on the occurrence of this attribute, however, nor should the server treat its occurrence as an error, because the client may have simply resubmitted a reformulated query or query otherwise previously returned by the server, where the server included a Hit Count attribute.)
- Comparison
- The following two values are used to test for the existence of an access point, not its value. When the comparison is Always Matches, it succeeds if and only if a value of the access point exists. When the comparison is Never Matches, it succeeds if and only if no value of the access point exists. These two values are used with null term:
- Always Matches
- Never Matches
- Following are used for numerical and scalar comparisons -- Integer and IntUnits; Externals and other expressions that evaluate to scalar values, including date/time; and character strings (for lexical comparisons):
- Equal
- Less Than
- Less Than Or Equal
- Greater Than
- Greater Than Or Equal
- Not Equal
- For range comparison (may be used with an external datatype,
for example, term-1 or
term-2):
- Between
- Contained In Bounding-polygon
- For character strings:
- Contains
- Relevance Feedback
- Regular Expression
as prescribed by IEEE 1003.2 Volume 1, Section 2.8 "Regular Expression
Notation"
- Masking
The rules governing the usage of this attribute are as follows:
The character '?' (question mark) is used to mask a variable number of
characters. It may be followed by a positive integer, i.e. one or more
consecutive decimal digits (where the first is positive) in which the positive
integer represented by the string of digits (beginning with the digit immediately
following the '?', up to and not including the first non-digit character), indicates a
range of characters to mask, from zero up to and including the specified integer.
When '?' is not immediately followed by a positive decimal digit, it indicates an
arbitrary number of characters to mask (from zero to a system defined limit).
The character '#' (pound or number sign) is used to mask a single character.
Multiple consecutive occurrences of '#' may be used to indicate a precise
number of characters to mask.
- Format/structure
No values defined in this set.
- Occurrence
Indicates the desired occurrence of the specified access point. Integer. For example, to indicate second author, the value of the Access Point attribute is 'author', and the value of the Occurrence attribute is 2.
- Indirection
- URI
The term supplied is a pointer (a URI) to the actual term. For example, the supplied term might be a URL, in which case the server should follow the link to the the resource, and substitute it as the term. This would be used, for example, for relevance feedback when a document is intended as the term, and a URL to the document is supplied.
- Scan Display Term
The term was previously provided by the server in a Scan Response, not as an actual term in the term list, but rather as a 'displayTerm' associated with an actual term. See 'term' and 'displayTerm' within TermInfo in the Scan ASN.1 definition. (This applies when the actual term in the term list is not suitable for display although it is the term that should be used in a query rather than the display term.) The server should substitute the actual term for the display term.
- Functional Qualifier
- Null
The Null value of the Functional Qualifier attribute is used in a manner analogous to the Null value of the Semantic Qualifier attribute. It may be supplied within a query operand, when that operand also includes one or more additional Functional Qualifier attribute values.
The server pairs each supplied value of the Functional Qualifier attribute with the Access Point attribute to try to find a best match. (The server determines what match is best.)
By including the value Null, the client indicates that the server may, if it chooses, ignore the other Functional Qualifier values (thus allowing the server to match the Access Point attribute value with Null, in effect rendering the Access Point unqualified).
Whenever one or more Functional Qualifier values are included in an operand, the server must either select one or fail the search. Thus the Null value is defined in order to allow the server to choose not to select any of the real Functional Qualifier values (i.e. those other than Null).
- Date/time created
May be used to qualify a date/time Access Point. May be used with utility set Access Point attribute Record Date to mean "Date/time record added to database".
- Date/time Last Modified
May be used to qualify a date/time Access Point. May be used with utility set Access Point attribute Record Date to mean "Date/time record last modified".
- Date/time Last Reviewed
May be used to qualify a date/time Access Point. May be used with utility set Access Point attribute Record Date to mean "Date/time record last reviewed".