SRU (Search/Retrieval Using URL)

SRU Implementors Group Meeting/Integration Workshop

Image: KB (Koninklijke Bibliotheek)MEETING REPORT, March 1-2, 2006

Released March 29, 2006

AGENDAS: Implementors Group Meeting - Integration Workshop
LINKS: Workshop Presentations - Version 1.2 Changes

 


CQL Modifiers

There will not be a change to CQL to add index modifiers. (Nor will there be a change to add term modifiers. Term modifiers were not part of the proposal but were discussed in some detail, and the idea was eventually abandoned.)  Thus all modifier that would apply to the index, relation, or term, will be carried as relation modifiers.  (Boolean modifiers will still be boolean modifiers.)

MODS context set

We will develop a bibliographic context set, whose name will be "bibliographic" (short name 'bib'), not "mods".  It will (as in the proposal) be based on MODS semantics.

The bibliographic set will incorporate all that's useful from the Bath context set (and Bath will be deprecated).

The MODS context set proposal can be used as a basis for this bibliographic set, but it needs significant work. A working group will be assigned to develop this set.

Tentative decisions (with regard to the original proposal, and given that there will be no index modifiers)

  • Flatten "type"; for example:
    title/type=abbreviated would instead be titleAbbreviated;
    title/type=uniform would instead be titleUniform;
    etc.
  • authority will become a relation modifier.
  • part: will either be flattened (as above) or become a relation modifier.

Marc Context set

change the following:

  • marc tag
    • ddd --> aaa  (i.e. alphanumeric rather than digits)
    • "up to" three characters, not "three" (fixed)
  • indicator
    • 0-9  (not just 1 or 2)
    • subfield: any character (not necessary alphanumeric) 

There will not be an OpenURL context set. Instead, there will be an OpenURL profile. 

The profile will prescribe a mapping from bibliographic indexes to OpenURL keys. This will be a complex task and hopefully will be taken on by the bib group.  

The premise behind the context set proposal had been that a resolver, upon receiving an OpenURL request, might want to search via SRU as part of the resolution process. The theory was that the resolver could take the keys from the received request and turn them directly into search indexes, which would make the task of creating the search much simpler.  

However, this would not be useful unless the sever understands and supports those indexes.  Since we are planning to develop bibliographic indexes (the bib set) and we want servers to support them, it would add too much complexity and cause confusion to also ask that severs also support the suggested OpenURL set.

So instead, the resolver should map the keys to bibliographic indexes, and the profile will specify that mapping.  This will make the process somewhat more difficult for the resolver than if it could simply use the keys as indexes, however the process will be simpler than it is today, because of the availability of a mapping, which today does not exist.

The profile may also specify how an SRU response can facilitate the client process of formulating an OpenURL. This corresponds to a scenario somewhat the opposite of above.  Above, a resolver receives an OpenURL and wants to formulate an SRU request. In this case an SRU client receives a record and wants to create an OpenURL (where the object described by that record will be the referent). A client could use SRU to find an item of interest, then request the record for that item in the appropriate OpenURL schema -- for example: http://www.openurl.info/registry/docs/xsd/info:ofi/fmt:xml:xsd:book for books, or http://www.openurl.info/registry/docs/xsd/info:ofi/fmt:xml:xsd:journal for journals -- and use it to formulate an OpenURL request.

We will solicit advice from the OpenURL community in developing this profile.

Proximity

Proximity units (other than in the cql set) should be treated such that "unit" itself is a value in a context set, rather than the unit value being a value in a context set.  For example suppose you want to define "street" as a proximity unit, within context set 'xyz'. 

Do it like this:   prox/xyz.unit="street"

rather than this:  prox/unit=xyz.street

Proximity units 'word', 'sentence', 'paragraph', which are included in the cql set, will be explicitly undefined.

All other proximity issues are deferred until version 2.0.

Sort

The sort proposal was accepted, with the provision that there needs to be additional prose to describe case insensitivity better.

SRU via POST

The SRU via POST proposal is accepted. It will be referred to as "SRU Post".

See http://www.loc.gov/standards/sru/sru-post.html

Open Search

Advantages of SRU vs. OS:

  • cql
  • schemas
  • scan
  • diagnostics
  • stability

The strategy we discussed is to make OpenSearch requests legitimate SRU requests. Then an SRU-friendly OS server will be able to do something intelligent when it gets an SRU-loaded OS request.

For example,  say the following three parameters were to occur in an OS request

  • query="alice lewis"
  • x-os-title=alice
  • x-os-creator=lewis

The latter two are (syntactically) legitimate SRU parameters where "x-" indicates an extension. An SRU-friendly OS server might combine these into a CQL query.  An ordinary OS server will ignore them (because it ignores whatever it doesn't understand) and will just process the two search terms.

With OpenSearch 1.1 we note that SRU's 'startRecord' is the same as OS 'startIndex' and SRU's 'maximum Records' is the same as OS 'count'. Further, with appropriate namespace declarations an OS declaration:

<url type="application/rss+xml"
template="http://example.com/?query={searchTerms}
&amp;startRecord={startIndex}&amp;
maximumRecords={count}&amp;x-format=rss"/>

Will correctly describe a valid SRU query. The contents of {searchTerms} would have to be a valid CQL query, which would require at least simple parsing of {searchTerms}.

If we want to avoid CQL parsing, a valid OS declaration would be:

<url type="application/rss+xml"
template="http://example.com/?query={searchTerms}
&amp;x-os-title={dc:title?}&amp;x-os-creator={dc:creator?}
& amp;startRecord={startIndex}&amp;
maximumRecords={count}&amp;x-format=sru"/>

OS 1.0 (single-field) servers could probably use the x- workaround, but perhaps everybody would be better off if they upgraded to the OS 1.1 (multi-field) standard. Standard guidance for mappings between, for example, x-os-creator and {dc:creator} would be good. It is likely possible to auto-generate a reasonable SRU Explain record from an OS description.

In any implementation merging SRU and OS, the two main issues are CQL parsing and response format.
We can provide generic solutions for simple parsing of CQL, or use the x- workaround to avoid this entirely.

However, response format is different between the two standards. OS uses lightly modified RSS and SRU a wrappered variety of schemas, one of which might be RSS. Implementors of OS servers would have to go through their code and switch on x-format to create the relevant wrappers and namespaced fields. This is likely not a very burdensome overhead for someone who has already implemented OS.

OAI Profile

An OAI over SRU profile will be defined. It will specify that a server support the following three indexes:

  • rec.identifier
  • rec.lastmodificationDate
  • CollectionIdentifier

An extension will be defined so that "extra data" may be returned -- the following three elements (corresponding to the above three):

  • oai:identifier
  • oai:datestamp
  • oai:setSpec

Copyright/License

There is currently no copyright or license indication in the spec.  We will investigate putting a liberal copyright, unlimited reproduction, with a Creative Commons type license.  There must be sufficient ownership, however, to prevent someone from claiming that a modified version is "SRU 2.1" for example. More discussion is needed on this.

XPath

XPath will be relegated to an extension, and it will become optional.

Record-Id

The record-id proposal to add a recordIdentifier element as an optional field to the record structure, was approved.  Semantics:

"This element contains a persistent, opaque, unique identifier for this record within this database, which can be subsequently used to retrieve the same record using a search on the 'rec.identifier' index. The identifier is not required to be globally unique, and nothing may be assumed about its structure."

Base URL in Response

The base URL will be included (optional, but strongly recommended) in the SRU response, within the echoed query (at the end).

Record Hits

The following search information will be incorporated (as optional elements) into XCQL, for each subquery:

  • hits  (number of records that matched the subquery)
  • term hits, for each term (number of occurrences of the term)
  • diagnostic(s)
  • recommended subquery

By "subquery", we mean that this information could apply for any search clause, simple or complex. For example, for:

(A and B) and C

The following are subqueries:

  • A
  • B
  • (A and B)
  • C
  • (A and B) and C

Z39.92

Z39.92 will replace the current Explain specification in the next SRU version (1.2 or 2.0).

CQL Name Change

Participants agreed to a suggestion to change the name of CQL from "Common Query Language" to "Contextual Query Language".

"Don't Care about Record Count"

An extension will be defined to allow the client to indicate that it does not care whether or not the server includes the parameter numberOfRecords in the response.  (This will mean making the parameter optional.)

The reason for this is the concern that in some environments, counting the records accurately is expensive.

"Number of Records Approximate"

A diagnostic will be defined to indicate that the number of records indicated by the numberOfRecords parameter is approximate.

Implementation Id Superceded

An earlier decision (June 2005) to add an implementationId parameter has been superceded by the addition of the base URL (described above). So that parameter is no longer necessary.

Standardization

The basic standardization plan presented was approved in principal, to take SRU to OASIS.   Included along with SRU would be:

  • CQL
  • Scan
  • the Explain Operation (but not the Explain spec itself)
  • mappings

Mappings would be:

  • SRU (i.e. url via http get)
  • SRU over SOAP  (i.e. SRW)
  • SRU Post

Thus SRW would be renamed "SRU over SOAP".

Note that SRU Record Update would not be part of this process.

There is a suggestion that after the OASIS process, we should fast track in NISO, and after that, fast track in ISO.

We need three OASIS members to initiate the process. Oxford is one. LC is currently trying to join.  Another possibility is University of Manchester.

Next Version

The OASIS process will produce SRU 2.0. In the interim, version 1.2 may be released, and it will be the input to the process. (If it is decided not to release a version 1.2, then 1.1 will be the input.)   See List of Changes for Version 1.2.