SRU (Search/Retrieval Using URL)

SRU Implementors/Ed. Board Meeting
June 18, 2007 - Library of Congress

Meeting Report - June 27      Disclaimer for External Links

Attendees  -  Topics  -  Action Items

Attendees

  1. Rebecca Guenther, NISO
  2. Larry Dixson, LC
  3. Ed Summers, LC
  4. Dan Chudnov, LC
  5. Ryan Scherle, Indiana University
  6. Pat Case, CRS
  7. Ardie Bausenbach, LC 
  8. Ray Denenberg, SRU Ed. Board
  9. Rob Sanderson, University of Liverpool

Topics

OASIS Process

The meeting began with a review of the OASIS process. See Charter.   The name of the Technical Committee is  "OASIS Search Web Services Technical Committee", abbreviated as "Search WS TC" and its purpose is to define Search and Retrieval Web Services, combining current and ongoing web service activities.  The scope includes:

  • Search/Retrieve
  • Query
  • Sorting
  • Record Retrieval
  • Index Browsing

One or more application profiles will be developed, not necessarily (and most likely not) by the TC, but within the appropriate community, for example, biblipgraphic, e-learning, geospatial. The work will involve semantic description of search services but will build upon existing work (e.g. NISO Z39.92) rather than define new descriptions, and will  seek input from abstract API initiatives such as OKI, ZOOM, and SQI. However the development or standardization of Abstract APIs is out of scope.  SRU and CQL will be used as the starting points. The expected deliverables are service definitions, schemas, interface specifications (POST, GET, SOAP), query language definition, and at least one community defined profile.

The first meeting of the of the TC will be held by teleconference, July 18. Work will be carried out primarily by email and teleconference calls, possibly every two weeks, with face-to-face meetings perhaps once or twice a year. Following the first several calls, an initial face-to-face meeting, perhaps two days, or one and a half, may be held. The TC will determine its own schedule once organized.

There will be a TC listserv as well as an additional implementors listserv.  The TC listserv archive will be publically accessible (only TC members can post). The implementors listserv will be open to anyone (even non-OASIS members).  [Need to determine procedures to join the open list.]

See Joining the OASIS Search Web Services Technical Committee.

CQL bibliographic searching

The August 7, 2006 draft (with minor revsion June 11, 2007) was reviewed, and the following changes were agreed upon.

  • Remove bib.titleSub. (It is redundant. There is a sub modifier.)
  • Add dc.creator to list of dc elements (currently contributor and publisher) to search on rather than bib.name with a role modifier.
  • Make marcrelator the default for bib.roleAuthority ( MARC Value List for Relators and Roles)
  • Make 'w3cdtf' the default authority for bib date indexes.
  • Add dc.date, for searching on non-specific dates.
  • Combine Resource Type and Genre into Resource Type/Genre.  The indexes will be dc.type and bib.genre, with modifier bib.typeauthority.
  • Make the default "server defined" for bib.languageAuthority. Guidance provided by RFC 3066 is recommended.

openURL Profile

There are three possible use cases for an OpenURL profile.

OpenURL search points - index or mapping?

Use case. An OpenURL  resolver, upon receiving an OpenURL request, might want to search via SRU as part of the resolution process. The resolver could take the keys from the received request, map the keys to bibliographic indexes,  and formulate an SRU request.

Background: Prior to the March 2006 meeting (more than a year ago) a set of OpenURL indexes had been proposed. At the meeting, it was the consensus that instead of defining explicit indexes, a mapping from the desired search points to bib indexes would be preferable as it seemed unlikely that the indexes would be implemented. However, in discussion preceding the recent meeting (June 2007) it was suggested that the sample mappings are too complicated and that simple indexes would be preferable, thus in effect suggesting that the earlier decision (March 2006) be reversed.

The meeting participants seem to feel that a mixed approach is best. Some indexes need to be defined because the alternative mappings are too complicated.  On the other hand some of the openURL search points map well to bib indexes.

The next step is to determine what are the useful search points. An initial set is listed in the document Searching on OpenURL Keys.

SRU to OpenURL

Use Case: An SRU client receives a record and wants to create an OpenURL,  where the object described by that  record will be the referent. A client could use SRU to find an item of interest, then request the record for that item in the appropriate OpenURL schema -- for example: http://www.openurl.info/registry/docs/xsd/info:ofi/fmt:xml:xsd:book for books, or http://www.openurl.info/registry/docs/xsd/info:ofi/fmt:xml:xsd:journal for journals -- and use it to formulate an OpenURL request.

SRU as an OpenURL Application

Use case:   Rob and Dan will write this up.

OAI Profile for SRU

This part of the report to be rewritten (Rob)

The profile would be roughly based on the ideas presented in the Sanderson/Young/LeVan 2005 DLIB article SRW/U with OAI: Expected and Unexpected Synergies, and the following summary is based on that article.

  • SRU Interfaces to OAI Aggregated Data
    allow the data harvested via OAI to be searched via SRU.
  • OAI Interfaces to SRW Provided Data
    building OAI on top of SRW..
    OAI has some requirements that SRU is not required to support, so these features can be profiled:
    • Three Indexes:
      • oai.identifier: a unique identifier for each record in the database
      • oai.datestamp:  date/time the record was added or changed in the database
      • oai.set: browsable via the scan operation, to support selective harvesting of records
    • an extension to provide an extraRecordData element with an oai:header fragment to include the identifier, dateStamp, and setSpec.
      This would provide support for the following OAI functions:
    • Identify: generated from selected parts of an SRU Explain response.
    • ListMetadataFormats: from schemaInfo of the Explain response.
    • ListSets:from an SRU Scan of the oai.sets index.
    • ListRecords and ListIdentifiers: from an SRU Search/Retrieve against the oai.datestamp and oai.set indexes.
    • GetRecord: from an SRU Search/Retrieve against the oai.identifier index.
  • OAI Retrieval of SRW Discovered Data
    In OAI, sets are an "optional construct for grouping items for the purpose of selective harvesting" , predefined, but left up to the repository to design and describe.  SRU has dynamic sets: i.e result sets. If a server had both SRU and OAI interfaces to the same collection, a search could be performed in SRU creating a set. Via extension metadata, information about the search could also be sent at the same time, such as a suggested human readable name and description. Once the result set has been created, it could be automatically exposed in the OAI interface for retrieval.

Holdings Schema

(This section was supplied by Janifer Gatenby)

The ISO Holdings Schema (ISO 20775) is designed to replace the Z39.50/SRU holdings and OPAC schemas. This standard differs from many other holdings standards in that it is primarily designed to be used in search responses rather than for reporting purposes.  As a response schema, it includes relatively static and dynamic information in combination.  The dynamic information comprises availability and policy information that may differ depending on the requester and also usage history.  It covers all holdings, physical and electronic.   Another important feature of the schema is that it includes a summary section for a group of "interchangeable copies" that can be readily parsed and displayed indicating availability and policy (e.g. terms of delivery).  The summary is flexible enough to cover multiple definitions of "interchangeability" depending on user needs, e.g. multiple copies (physical and digital) of a book or article, multiple copies of various different editions of a work, and multiple copies of different works in a result set.  The schema includes detail about holdings and an optional section about the resource or group of resources to which the holdings pertain.  Thus the schema may be used standalone or it may be used as a fragment of a larger schema, e.g. MODS or ONIX.  Two example scenarios:

  • A query requests bibliographic and holdings detail be returned in the response.  The results are sent as MODS records with Holdings schema embedded in each record to include holdings (the holdings schema does not include a redundant resource section).

  • A query requests bibliographic details which are returned in MODS.  A follow on query requests holdings detail for the records in the previous result set.  In this case the ISO Holdings Schema is used with the bibliographic identifiers in the resource section.

The standard is long awaited and approval is expected by the end of 2007.  The first attempt at a holdings schema that included item availability was the Z39.50 OPAC schema.  The Z39.50 holdings schema was an attempt to supersede this OPAC schema but it was complicated, little understood and very sparsely implemented as a consequence.  In 2004, ISO formed a working group to create a holdings schema, overcoming the limitations of the Z39.50 OPAC and Holdings Schemas.  After a slow start the group re-energized in 2006, and an XML version of the schema is available for testing purposes at:
http://oclcpica.org/?id=1013&ln=uk.

Record metadata

A Specification for Requesting Record metadata via SRU  has been developed. As part of this work, an XML namespace (draft) has been developed for and there is a draft Namespace Information Page. The 'rmd' schema needs to be developed.

Agreements

  • The use of the expression "administrative metadata" will be struck from the document. "Record metadata" will be used instead.
  • MODS elements from recordInfo will be added.
  • Some of the elements from rec 1.1 are missing and will be added.

Limitation if Identifier and Result Set not Available

If a client wants to retrieve record metadata for a specific record, and if it knows either the record identifier or the result set position, it can request the record by identifier or result set postition specifying the rmd schema, or some other record metadata schema.  (And if the client has already retrieved the record, and if the record has an identifier, then the client knows the identifier if SRU 1.2 is used, because the identifier is now part of the record response structure.)

However, if the record does not have an identifier, and if the server does not support result sets, then neither mechanism (record identification via identifier or result set position) is available.  In that case, the only way to retrieve the record metadata is by explicitly requesting that it accompany the record data. The client must request it via extraRequestData, and the server supplies it via extraRecordData.

Add discussion of this limitation to the record metadata document.

Record Update

See June 8 update and Namespace Information Page.  This is awaiting completion of the schema and wsdl bindings, and will be completed soon.  Record Update will not be part of the OASIS work, but could be part of the bibliographic profile.

Bibliographic Profile for SRU

The premise behind a bibliographic profile for SRU is that work on the base protocol for SRU 2.0 will be done in OASIS, and community profiles developed in appropriate communities. We hope that a bibliographic profile will be taken on by NISO. It would include:

  • bib context set
    • Related bibliographic mappings
    • Semantics
  • OpenURL
    • mapping
    • context set
    • other scenarios
  • OAI
    • as described above
  • Holdings schema
    • scenarios described above
  • Record Metadata
  • Record Update

Rebecca will investigate the possibility of NISO taking on this work.

The OASIS standardization and the profiling processes should proceed in parallel with liaison activity.  The profiling activity may result in additional requirements for the protocol and these should be forwarded to the OASIS TC.

XQuery and CQL

Discussion of accomodating XQuery within CQL, or vice versa, or profiling XQuery, is deferred.  If we discover XQuery functions, desired but not supported by CQL, we should renew discussions.

Completing the 1.2 site

See draft SRU 1.2. Following the meeting we hope to make this the official SRU spec as soon as we can. (It will still need to undergo approval at LC.)   The URL for the draft site is http://www.loc.gov:8081/standards/sru/ and for the current (1.1) SRU, http://www.loc.gov/standards/sru/  so the only difference is the :8081 port number (which just means it is the test server at LC).

Several of the pages have not yet been written (for example the introductions) and these will need to be written or the pointers removed.  Rob will write the introductions for SRW, CQL, and Zeerex. 

Discussion of "Frequently Asked Questions" (FAQ). There should be an "official" FAQ and an "unofficial" FAQ. The official FAQ would be under control of the Ed. Board, maintained by Rob.  The unofficial FAQ would reside on the SRU wiki.

SRU Explain/zeeRex/Z39.92

Issues:

  • The OASIS charter references Z39.92, and it is planned that the completed standard will use Z39.92 for Explain.  But Z39.92 was never finished.  It seems to have been abandoned by NISO along with the other metasearch work.  Rob and I will inquire about the status and if there is any prospect for getting it finished.
  • In fact, SRU 1.2 is supposed to reference Z39.92, according to What's New in Version 1.2?. But there is nothing in the 1.2 spec to reflect any such change. Rob's opinion is that Z39.92 is completely compatible with SRU so no technical change is necessary (but he will doublecheck) so we just need to make mention of Z39.92 somewhere.
  • The terms "Explain" and "Zeerex" seem to be used interchangeably but the have somewhat different meanings. Rob will review all such terminology in the 1.2 spec.

OpenSearch, SRU and RSS

OpenSearch now has a website, openSearch.com, but it isn't clear who is in charge of the spec since its founder, Dewitt Clinton, has gone off to Google. It is still in the domain of A9.

Response Format

It was the consensus of the meeting that there should be a parameter (in SRU version 2.0) to specify the requested response schema: SRU, RSS, ATOM, ext.

Integrating SRU and OpenSearch

One strategy is to make OpenSearch requests legitimate SRU requests. Then an SRU-friendly OS server will be able to do something intelligent when it gets an SRU-loaded OS request.

SRU's 'startRecord' is the same as OS 'startIndex' and SRU's 'maximum Records' is the same as OS 'count'. So an OS declaration:

<Url type="application/rss+xml"
template="http://example.com/?query={searchTerms}
&amp;startRecord={startIndex}&amp;
maximumRecords={count}&amp"/>

Will correctly describe a valid SRU query.

However, we need both:

  • query={searchTerms}    and
  • query="{searchTerms}"

i.e. one with {searchTerms} quoted and one unquoted. So we need two templates, the one above, and in addition:

<Url type="application/rss+xml"
template="http://example.com/?query="{searchTerms}"
&amp;startRecord={startIndex}&amp;
maximumRecords={count}&amp"/>

Proximity

All proximity issue, except the following, "proximity units", are deferred for now and may be raised during the OASIS process.

Proximity Units

The March 2006 meeting report  says:

Proximity units (other than in the cql set) should be treated such that"unit" itself is a value in a context set, rather than the unit value being a value in a context set.
For example suppose you want to define "street" as a proximity unit, within context set 'xyz'. Do it like this:
prox/xyz.unit="street"
rather than this:
prox/unit=xyz.street
Proximity units 'word', 'sentence', 'paragraph', which are included in the
cql set, will be explicitly undefined.

Nothing explicit got into the 1.2 spec to reflect this.  Fortunately there is nothing to prevent it either, this is supported by the bnf, so adding prose to reflect does not constitute a substantive change. 

'prox/xyz.unit="street" ' is preferable to ' prox/unit=xyz.street' because whenever you have 'prox/unit=......' , 'unit' is a modifier from the cql context set, so it's value would have to be one that is defined in the cql context set. prox/unit=xyz.street matches a modifier from one set with a value from another, not a good practice.

One might suggest adding all the units that people come up with to the cql set.  But we don't want to do that. This is a way to support the definition of units in other context sets.

So we propose to add prose, in two places.

1. http://www.loc.gov:8081/standards/sru/specs/cql.html number 9, "Boolean Modifiers":

Proximity units 'word', 'sentence', 'paragraph', and 'element' are defined in the CQL context set, and may also be defined in other context sets. Within the CQL set they are explicitly undefined. When defined in another context set they may be assigned specific meaning.

Thus compare  "prox/unit=word"  with "prox/xyz.unit=word".
In the first, 'unit' is a prox modifier from the CQL set, and as such its values are undefined, so 'word' is subject to interpretation by the server. In the second, 'unit' is a prox modifier defined by the xyz context set, which may assign  the unit 'word' a specific meaning.

The context set xyz may define additional units, for example, 'street':

                     prox/xyz.unit="street"

Note that this approach, prox/xyz.unit="street", is preferable to 'Prox/unit=xyz.street'. In the first case, 'unit' is a modifier define in the xyz context set, and 'street' is a value defined for that modifier. In the second, 'unit' is a modifier from the cql context set, with a value defined in a different set. so it's value would have to be one that is defined in the cql context set. Pairing a modifier from one set with a value from another is not a good practice.

2. http://www.loc.gov:8081/standards/sru/resources/cql-context-set-v1-2.html under "PROX"

Similar prose.

Action Items

  • Ray
    • revise CQL Bibliographic Searching document   done
    • determine procedures to join the open list
    • apply changes to record metadata document done
    • rmd schema
    • add prox prose to 1.2 spec      done
    • record metadata changes
  • Rob
    • rewrite " OAI Profile for SRU" section of report.
    • introductions for SRW, CQL, and Zeerex. 
    • Zeerex
      • Doublecheck that Z39.92 is completely compatible with SRU
      • Review use of terms "Explain" and "Zeerex"in the 1.2 spec.
  • Rebecca
    • investigate the possibility of NISO taking on bib profile.
  • Rob and Ray
    • inquire about the status of Z39.92 and if there is any prospect for getting it finished.
  • Rob and Dan
    • Write up use case for SRU as an OpenURL Application
  • Matthew
    • Schema and wsdl bindings, for record update.
  • Unassigned
    • Determine what are the useful search points for OpenURL