28th February 1999
Version 0.2b (minor changes only from 0.2)
SCCSID ("@(#)/src/z39.50/ssl/zthes/SCCS/s.zthes.html 1.8")
It is intended that the abstract model described here is sufficiently general that it can also be implemented by protocols and data formats other than Z39.50. As an example, an appendix defines an XML DTD for thesaurus terms based on the model, and includes an example XML document using that DTD.
The current version of this profile does not mandate or describe any relationship between a thesaurus and a database. The model is that terms from any thesaurus database may be used to search any other database. A subsequent version of this profile may provide specifications for ``tied thesauri'' which have particular relevance to one or more known databases.
This profile does not attempt to address the issue of multilingual thesauri as described in ISO 5964 . A subsequent profile, or a subsequent version of this one, may do so.
Each individual term in a thesaurus is represented by a record in the database. In the interests of simplicity and orthogonality, even non-preferred terms must be represented by their own records.
Term records consist of an initial part describing the term itself (with information such as its unique identifier, scope note, etc.), together with sub-records briefly describing related terms. The primary means of navigation from one term to another is by searching for the unique identifiers of the terms related to the first one.
|1||mandatory, not repeatable|
|[1,0]||optional, not repeatable|
The top level term record is composed of the following elements:
|termId||1||an opaque string of characters which uniquely identifies the term within the thesaurus|
|termName||1||the name of the term in a form which may be displayed to a user or used as a search term in a target database|
|termQualifier||[0,1]||an additional string which, if supplied, qualifies the term name such that the combination of term and qualifier is unique within the thesaurus|
|termType||[0,1]||an indication of the type of the term, chosen from the controlled vocabulary described below|
|termLanguage||[0,1]||the language of the term|
|termNote||[0,1]||a scope note for the term: that is, arbitrary prose clarifying the meaning and scope of the term|
|termCreatedDate||[0,1]||the date on which the record defining the term was created|
|termCreatedBy||[0,1]||the name of the person who created the record defining the term|
|termModifiedDate||[0,1]||the date on which the record defining the term was last modified|
|termModifiedBy||[0,1]||the name of the person who last modified the record defining the term|
|relation||0+||a sub-record, in the format described below, briefly describing a term related to this one|
It is recognised that in many thesauri there is no explicit unique identifier field, and the term itself, perhaps in combination with the qualifier, uniquely identifies a record. Thesauri such as these must nevertheless provide a termID field, which may be automatically generated simply by combining the term and qualifier.
The termType element may take the following values:
Servers may return other values of termType at their discretion. It is recommended that such extension values begin with the string ``X-''.
Each relation sub-record is composed of the following elements:
|relationType||1||an indication of the type of the relation, chosen from the controlled vocabulary described below|
|sourceDb||[0,1]||if specified, the host, port and name of a different database in which the related term is found; otherwise, the related term is in the same database as the current one|
|termId||1||the unique identifier of the related term within its database|
|termName||1||the name of the related term|
|termQualifier||[0,1]||the qualifier of the related term|
|termType||[0,1]||the type of the related term|
|termLanguage||[0,1]||the language of the related term|
The relationType element may take the following values:
Servers may return other values of relationType at their discretion. It is recommended that such extension values begin with the string ``X-''.
This profile deliberately restricts its set of supported relations to those discussed in ISO 2788 , in the belief that it is better for a small set of relations to be used interoperably than for a larger set to be specified, with different servers and clients in practice using different subsets.
The ``NT'' and ``BT'' relationships are reciprocal; so are ``USE'' and ``UF''; and ``RT'' is reflexive. That is, when any term T1 points to another T2 using the relation ``NT'', T2 should point back to T1 using ``BT'' and vice versa; when T1 points to T2 using the relation ``USE'', T2 should point back to T1 using ``UF'' and vice versa; and when T1 points to T2 using the relation ``RT'', T2 should point back to T1 using the same relation.
The termType element in a relation sub-record may take the same values as in the top-level record.
Support for additional searches may be useful.
As such, it requires servers and clients to support version 3 of the Z39.50 protocol.
In designing the Zthes-1 attribute set of additional attributes required for thesaurus navigational searches, we have sought to comply with the guidelines expounded in the the Attribute Set Developers Guide.
Unusually for a Z39.50 profile, the intention of this profile is that it be used in conjunction with other profiles. It is envisaged that an application will use the Zthes profile to navigate a thesaurus and thereby obtain terms suitable for searching in another database; and use a second, domain-specific profile such as GILS or CIMI to search in and retrieve from that database.
(Tags 4, 5 and 6 have more general application, and so should probably be moved into tagSet-M.)
|1||tagSet-M, defined in appendix TAG.2.1 of the Z39.50 standard |
|2||tagSet-G, defined in appendix TAG.2.2 of the Z39.50 standard |
|3||application-defined string tags|
|4||tagSet-Zthes, defined above|
The abstract schema described in section 2.2 is represented in Z39.50 by a GRS-1 record encoded with the tag-paths specified in the following table. Where possible, standard tags from tagSet-M and tagSet-G are re-used; in these cases, the generic names of the tags are listed in the right-hand column.
|Tag Path||Occurrence||Element||Generic Name|
The termLanguage element is expressed as one of the standard codes described in RFC 1766  and ISO 639  - for example, ``en'' for English, ``fr'' for French and ``de'' for German.
The administrative date fields should be returned in the ASN.1 GeneralizedTime format. (The working group considered the Z39.50 ASN.1 date/time definition , but reached the conclusion that the benefits would be outweighed by the barrier raised to implementation.
The person-name elements, termCreatedBy and termModifiedBy, may be returned in whatever format is convenient for the server: this profile does not attempt to address the interpretation of such administrative information across multiple databases.
The sourceDb element should be returned in the form of a z39.50s URL as described in RFC 2056 . For example, if the related term is in the database called ``aat'' on the server running on port 3950 on the host foo.bar.org, then the sourceDb element should have the value z39.50s://foo.bar.org:3950/aat.
Servers may, at their discretion, include additional tagSet-M, tagSet-G and string-tagged (type 3) elements in the records they return; they may include such additional elements at the top level, within relation sub-records, or both. Clients may display any such additional elements as they see fit, or may ignore them.
This element set may be useful when constructing a summary of several records found by a search for initial entry points to a thesaurus; it unlikely to be useful when navigating from term to term.
|1||1||termQualifier||searches in the termQualifier element of the top-level term record|
|1||2||termType||searches in the termType element of the top-level term record|
|1||3||createdBy||searches for the name of the person who created a record|
|1||4||modifiedBy||searches for the name of the person who last modified a record|
(Access points 3 and 4 should probably be moved into either the utility attribute set (since they pertain to the record itself), or to the cross-domain attribute set (since they correspond to date searches supported by that attribute set). This indeterminacy raises the broader issue of which of these attribute sets should support searching for administrative dates: currently both do!)
The Zthes-1 attribute set conforms to attribute set class 1 as described in the Z39.50 Attribute Architecture. It should not be used as the dominant set in a query; when formulating queries to search Zthes databases, the utility or cross-domain attribute set should be dominant.
|Attribute Set||Type||Value||Search For||Generic Name|
For the purpose of searches on the local number access point, values of the termID function as opaque ``magic cookies''. Therefore, such search terms should not include any contentAuthority attribute, even if it happens that for the specific thesaurus in question, the termID identifiers are taken from a well-known source.
The following additional access points may optionally be supported:
|Attribute Set||Type||Value||Search For||Generic Name|
|utility||1||1||termCreatedDate||date/time added to database|
|utility||1||2||termModifiedDate||date/time last modified|
|cross-domain||1||3||termCreatedBy and termModifiedBy||name (Note 1)|
|cross-domain||1||10||termLanguage||language (Note 2)|
(If we specify these searches, then we probably don't need the createdBy and modifiedBy attributes in zthes-1; would it follow that they're not needed in the utility set? The more general question is whether these are properly utility or cross-domain attributes?
|||National Information Standards Organization. ANSI/NISO Z39.50-1995. Information Retrieval (Z39.50): Application Service Definition and Protocol Specification. Bethesda, MD: NISO Press, 1995. Also available at http://www.loc.gov/z3950/agency/document.html|
|||International Organization for Standardization. ISO 2788: Guidelines for the establishment and development of monolingual thesauri, 2nd ed. Geneva: ISO, 1986.|
|||Z39.50 Maintenance Agency. Z39.50 Attribute Architecture, Draft of November 1998. Available at http://www.loc.gov/z3950/agency/attrarch/arch.html|
|||Z39.50 Maintenance Agency. Z39.50 Utility Attribute Set, Draft 1 of October 1, 1998. Available at http://www.loc.gov/z3950/agency/attrarch/util-d1.html|
|||Ralph LeVan. A Cross-Domain Attribute Set, version 1.2 of 1998/11/16. Available at http://www.oclc.org/~levan/docs/crossdomainattributeset.html|
|||George Percivall. Attribute Set Developers Guide, annotated outline of 18th September 1998. Available at http://harp.gsfc.nasa.gov/~eric/attr_set_developers_guide.html|
|||International Organization for Standardization. ISO 5964: Guidelines for the establishment and development of multilingual thesauri. Geneva: ISO, 1985.|
|||H. Alvestrand. RFC 1766: Tags for the Identification of Languages. March 1995. Available at ftp://ftp.uu.net/inet/rfc/rfc1766.Z|
|||International Organization for Standardization. Prepared by ISO/TC 37, Terminology (principles and coordination). ISO 639:1988 (E/F): Code for the representation of names of languages, 1st edition, 1988.|
|||R. Denenberg, J. Kunze, D. Lynch. RFC 2056: Uniform Resource Locators for Z39.50. November 1996. Available at ftp://ftp.uu.net/inet/rfc/rfc2056.Z|
|||Z39.50 Maintenance Agency. Z39.50 Date/Time Definition, April 6, 1998 (amended February 17, 1999.) Available at http://www.loc.gov/z3950/agency/defns/date.html|
<!-- Zthes DTD Based on Z39.50 Profile for Thesaurus Navigation, version 0.1 (20 Feb 1999) Version of DTD: 25 Feb 1999 --> <!-- #PCDATA: parseable character data = text occurence indicators (default: required, not repeatable): ?: zero or one occurrence (optional) *: zero or more occurrences (optional, repeatable) +: one or more occurrences (required, repeatable) |: choice, one or the other, but not both --> <!ENTITY % term "termId, termName, termQualifier?, termType?, termLanguage?"> <!ENTITY % admin "termCreatedDate?, termCreatedBy?, termModifiedDate?, termModifiedBy?"> <!ELEMENT Zthes (%term;, termNote?, %admin;, relation*)> <!ELEMENT relation (relationType, sourceDb?, %term;)> <!ELEMENT termId (#PCDATA)> <!ELEMENT termName (#PCDATA)> <!ELEMENT termQualifier (#PCDATA)> <!ELEMENT termType (#PCDATA)> <!ELEMENT termLanguage (#PCDATA)> <!ELEMENT termNote (#PCDATA)> <!ELEMENT termCreatedDate (#PCDATA)> <!ELEMENT termCreatedBy (#PCDATA)> <!ELEMENT termModifiedDate (#PCDATA)> <!ELEMENT termModifiedBy (#PCDATA)> <!ELEMENT relationType (#PCDATA)> <!ELEMENT sourceDb (#PCDATA)>
(This appendix should include a crosswalk with any pre-existing thesaurus DTDs if appropriate.)
<?XML version="1.0" ?> <!DOCTYPE Zthes SYSTEM "zthes.dtd"> <Zthes> <termId>102067</termId> <termName>video art</termName> <termType>NT</termType> <termNote> Use for works of art that employ video technology, especially videotapes. For the study and practice of the art of producing such works, use "video." </termNote> <relation> <relationType>UF</relationType> <termId>102067/001</termId> <termName>art, video</termName> <termType>ND</termType> </relation> <relation> <relationType>BT</relationType> <termId>185191</termId> <termName>[time-based works]</termName> <termType>NL</termType> </relation> <relation> <relationType>RT</relationType> <termId>54153</termId> <termName>video</termName> <termType>NT</termType> </relation> <relation> <relationType>RT</relationType> <termId>253827</termId> <termName>video artists</termName> <termType>NT</termType> </relation> </Zthes>
Several other people have also expressed an interest in implementing this profile.