SRU (Search/Retrieval Using URL)

The CQL Context Set (version 1.1)

see also version 1.2

The CQL context set defines a set of indexes, relations and relation modifiers. The indexes supplied are 'utility' indexes which do not directly reference any data. These utility indexes are for instances when CQL is required to express a concept not directly related to the records.

Historical note: In CQL version 1.0, this was the 'srw' index set. Implementers may wish to accept the 'srw' as a reserved name for the identifier '//www.loc.gov/zing/cql/srw-indexes/v1.0/' with the same semantics as below. srw.resultSetName has been renamed to cql.resultSetId for consistency.

  • The reserved name for this context set is: cql
  • The identifier for this context set is: info:srw/cql-context-set/1/cql-v1.1

Sections: Indexes | Relations | Relation Modifiers | Relation Qualifiers | Boolean Modifiers


Indexes

  • resultSetId
    A search clause may be a result set id. This is a special case, where the index and relation are expressed as "cql.resultSetId =" and the term is the result set id returned by the server in the 'resultSetId' parameter of the response. It may be used by itself in a query to refer to an existing result set from which records are desired. It may also be used in conjunction with other resultSetId clauses or other indexes, combined by boolean operators. The semantics of resultSetId with relations other than "=" is undefined.
  • serverChoice
    This is the default when the index and relation is omitted from a search clause. 'cql.serverChoice' means that the server will choose an index for the given term. The relation used is 'scr', hence 'cql.serverChoice scr "term"' is an equivalent search clause to '"term"'.
  • anywhere
    This means "search all indexes from all context sets you know". (By contrast, cql.serverChoice means essentially "search any index -- your choice -- from any context set you know".)
  • allRecords
    A special index which matches every record available.  Every record is matched no matter what values are provided for the relation and term,  but the recommended syntax is: cql.allRecords = 1.

Relations

Implicit Relations
These relations are defined as such in the grammar of CQL. The cql context set only defines their meaning, rather than their existence.

  • <, >, <=, and >= retain their regular meanings as relations pertaining to ordered terms
  • = is used:
    • For word adjacency, when the term is a list of words. That is to say that the words appear in that order with no others intervening.
    • Otherwise, for exact equality of value.
  • <> is 'not equal to'.

Default Relations
These relations are defined as being widely useful as part of a default context set.

  • scr is used to mean "server choice relation". It is used when the client wishes the server to choose the most appropriate relation for the index or term. It is assumed when relation is omitted.
  • exact is used for exact string matching, when the term is a character string. =/cql.string is synonymous.
  • all and any may be used when the term contains multiple items to indicate "all of these items" or "any of these items". These queries could be expressed using boolean AND and OR respectively. These relations have an implicit relation modifier of 'cql.word'.
  • within may be used with a search term that has multiple dimensions. It matches if the database's term falls completely within the range, area or volume described by the search term. For example: dc.date within "2002 2003"
  • encloses may be used when the index's data has multiple dimensions. It matches if the database's term fully encloses the search term. For example: xxx.dateRange encloses 2002

Relation Modifiers

Term Functions
These relation modifiers request that the server perform some algorithm on each item within the term before processing. If named algorithms are required, then further context sets should define relation modifiers for these.

  • stem
    The server should apply a stemming algorithm to the words within the term. For example such that computing and computer both match the stem of 'compute'.
  • relevant
    The server should use a relevancy algorithm for determining matches and the order of the result set.
  • phonetic
    The server should use a phonetic algorithm for determining words which sound like the term.
  • fuzzy
    The server should be liberal in what it counts as a match. The exact details of this are left up to the server, but might include permutations of character order, off-by-one for numerical terms and so forth.

Relation Qualifiers

These modifiers qualify the relation to more precisely determine its semantics.

  • partial
    When used with within or encloses, there may be some section which extends without the term. This permits for the database term to be partially enclosed, or fall partially within the search term.
  • Term Format
    These relation modifiers describe the format or structure of the term in some fashion.
  • word
    The term should be broken into words, according to the server's definition of a 'word'
  • string
    The term is a single item, and should not be broken up.
  • isoDate
    Each item within the term conforms to the ISO 8601 specification for expressing dates.
  • number
    Each item within the term is a number.
  • uri
    Each item within the term is a URI.
  • masked (default modifier)
    The following masking rules and special characters apply for search terms, unless overridden in a profile via a relation modifier. To explicitly request this functionality, add 'cql.masked' as a relation modifier.
    1. A single asterisk (*) is used to mask zero or more characters.
    2. A single question mark (?) is used to mask a single character, thus N consecutive question-marks means mask N characters.
    3. Carat/hat (^) is used as an anchor character for terms that are word lists, that is, where the relation is 'all' or 'any', or '=' when used for word adjacency. It may not be used to anchor a string, that is, when relation is 'exact' (string matches are, by default, anchored). It may occur at the beginning or end of a word (with no intervening space) to mean right or left anchored."^" has no special meaning when it occurs within a word (not at the beginning or end) or string but must be escaped nevertheless.
    4. Backslash (\) is used to escape '*', '?', quote (") and '^' , as well as itself. Backslash not followed immediately by one of these characters is an error. 
      See masking examples below.
  • unmasked
    Do not apply masking rules.
  • oid
    The term is an ISO object identifier, dot-separated format. Example
            'zeerex.set exact/cql.oid "1.2.840.10003.3.1"'

Masking examples:

  1. dc.title = c*t (matches cat and coast etc.)
    dc.title = "*fish food*" (matches unanchored 'fish food')
  2. dc.title = c?t (matches cat and cot, not coast or ct)
    " ?" (matches any single character)
  3. dc.title = "^cat in the hat" (matches 'cat in the hat' where it is at the beginning of the field)
    dc.title any "^cat ^dog eats rat" (matches 'cat eats rat', 'dog eats cat', 'cat loves bat', but not 'bat loves cat')
  4. dc.title = "\"Of Couse\" she said"
    dc.identifier exact "\\\"\^\*\?andSomeMoreCharacters"

Boolean Modifiers

The CQL context set defines four boolean modifiers, which are only used with the prox boolean operator.

  • distance
    The distance that the two terms should be separated by.
    Takes the form:
         distance [relation] [value]
    where relation is one of: "<", ">" ,"<=" ,">=" ,"=" , "<>"; default "<="
    and value is a non-negative integer.
    e.g. "distance<2"
    default: 1 for word, zero otherwise
  • unit
    The type of unit for the distance.
    Takes the form:
          unit=[value]
    where value is one of: 'paragraph', 'sentence', 'word' and 'element', or a value from another context set.
    e.g. "unit=sentence"
    default "word"
  • ordered
    The order of the two terms must be as per the query.
  • unordered
    The order of the two terms is unimportant. This is the default.