OpenSearch Compared with SRU

Now Not Too Far Apart..

Derek Lane

CSC Task Order 4047, EPA EIMS/NSDI Support

Disclaimer & Acknowledgements

OpenSearch

Opensearch can be regarded as a minimal way of labelling web forms and returning results of a filled form in a well-known format.

SRU

Distributed search done in the most web-friendly manner in accordance with Library traditions.

Adoption-oriented Properties

I will track three themes here:

All pronounced ending in 'ay' (just made that up)

Oriented to adoption.

Kawlitee

Quality is deep, a human judgement. Zen and the art of Motorcycle Management is a thick book exploring quality.

Kwalitee is more suited to what computers do.

Kwalitee reduces uncertainty for the implementor and provides closure. Can also help with the various regressions (in implementation, in the standard, in external integration requirements).

Kwalitee in Search

Hewmilitee

Humility is associated with mystics, some children...

I aint humble, and yall aren't either. Lost Cause.

Hewmilitee is the stance that the lowest-tech implementation is preferred.

Easy to use. Helps implementors and improves machine re-use.

Hewmilitee in Search

OpenSearch Hewmilitee

SRU uses existing XML machinery more extensively, profiles, very clear story on a standard one-field form

Hewmilitee:A Plea

Be Kind to Your Contractor

developer with a headache

Contractors Can be Forced to Look Simple by Their Environment

developers forced to be simple

Jenerositee

Generosity is the human impulse to give.

Generositee is the avoidance of legal and organizational obstacles to implementation.

Jenerositee in Search

OpenSearch Jenerositee

SRU: unclear copyright, patent, (must.. trust... the Library community); costless, no registration.

SRU-OpenSearch Components

Query Specification

SRU/CQL:

dinosaur and (dc.title='fish' or (dc.title='water' and dc.creator='fred'))

OpenSearch:

<Url type="application/rss+xml" template="http://example.com/?q={searchTerms} &auth={dc:creator}"/>

Service Description

Response Formats

Complex Form

original advanced search
Cut-down version of the advanced query part of EIMS at http://oaspub.epa.gov/eims/query.page?frm=advanced

Complex Form: Embedded Search

embedded alternate search
Here we have a common multi-field search (author, format, date, organization) mixed with a known-item (in red) search. An external machine interface should separate these.

Complex Form: Bad Interface

original advanced search
Note that the description search is presented here as a keyword search of type 'description'. Much better for the machine interface to have 'description' search rather than 'keyword' search of type 'description'. (This allows for better type-checking and provides a normal form for fielded search.)

OpenSearch Description

From standard
<?xml version="1.0" encoding="UTF-8"?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
  <ShortName>Web Search</ShortName>
  <Description>Use Example.com to search the Web.</Description>
  <Tags>example web</Tags>
  <Contact>admin@example.com</Contact>
  <Url type="application/rss+xml"
       template="http://example.com/?q={searchTerms}&pw={startPage}&format=rss"/>
  <Url type="application/atom+xml"
       template="http://example.com/?q={searchTerms}&pw={startPage?}&format=atom"/>
  <Url type="text/html" 
       method="post"
       template="https://intranet/search?format=html">
    <Param name="s" value="{searchTerms}"/>
    <Param name="o" value="{startIndex?}"/>
    <Param name="c" value="{itemsPerPage?}"/>
    <Param name="l" value="{language?}"/>
  </Url>
  <LongName>Example.com Web Search</LongName>
  <Image height="64" width="64" type="image/png">http://example.com/websearch.png</Image>
  <Image height="16" width="16" type="image/vnd.microsoft.icon">http://example.com/websearch.ico</Image>
  <Query role="example" searchTerms="cat" />
  <Developer>Example.com Development Team</Developer>
  <Attribution>
    Search data &copy; 2005, Example.com, Inc., All Rights Reserved
  </Attribution>
  <SyndicationRight>open</SyndicationRight>
  <AdultContent>false</AdultContent>
  <Language>en-us</Language>
  <OutputEncoding>UTF-8</OutputEncoding>
  <InputEncoding>UTF-8</InputEncoding>
</OpenSearchDescription>

OpenSearch Response

From standard.
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" 
     xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
  <channel>
    <title>Example.com Search: New York history</title>
    <link>http://example.com/New+York+history</link>

    <description>Search results for "New York history" at Example.com</description>
    <opensearch:totalResults>4230000</opensearch:totalResults>
    <opensearch:startIndex>21</opensearch:startIndex>
    <opensearch:itemsPerPage>10</opensearch:itemsPerPage>

    <opensearch:link rel="search"
                     href="http://example.com/opensearchdescription.xml" type="application/opensearchdescription+xml"/>
    <opensearch:Query role="request" searchTerms="New York History" />
    <item>
      <title>New York History</title>

      <link>http://www.columbia.edu/cu/lweb/eguids/amerihist/nyc.html</link>
      <description>
        ... Harlem.NYC - A virtual tour and information on 
        businesses ...  with historic photos of Columbia's own New York 
        neighborhood ... Internet Resources for the City's History. ...
      </description>
    </item>
    <!-- 9 additional <item> elements appear here -->

  </channel>
</rss>

Performance: Counting

<opensearch:totalResults>4230000</opensearch:totalResults>
Costs between 70% and 14% overhead
SQL> select count_t/tot_t perc,subj from statsrch 
     order by perc desc;

      PERC SUBJ
---------- ------------------------------
.804412235 flatter
.802026745 responses
.773255914 life
.748703693 mercury
.698230648 updates
.513219868 frogs
 .38402384 water
.366259595 life
[snip]
.313855368 frogs
.263150272 earth
.236415756 frog
.204388335 frogs
.195137311 frogs
.153592408 frog and water
.140008982 frogs and water

Form to CQL

query.execadvquery?infotype=ALL&subject=frogs &subjtype=ABSTRACT &xpcollection=PESTICIDES &xpcollection=WATER+QUALITY&recppg=10&sortcols=Type

CQL
dc.description=frogs and (eims.xpcollection=PESTICIDES or eims.xpcollection=WATER QUALITY)

OpenSearch Description
template='q={searchTerms?}&xpcollection={eims:xpcollection?} &recppg={count}&sortcols=Type

Things to Steal From SRU

More Things to Steal From SRU