SRU Record update
June 8, 2007
This is version 1.0 of SRU Record Update
The Record Update service allows for remote maintenance from and administration of records within a compliant database. It has a simple and extensible mechanism for this, a single request/response pair that allows for creation, replacing and deleting records and metadata about those records.
The need for such a protocol has been expressed by several groups, but may benefit many. In particular, it is required for datasets which are maintained by distributed collaboration and contribution such as union catalogues, local history databases, book review databases and so on. Going further, it also allows many clients to be created for one service rather than a very tightly linked client/server relationship.
Although the protocol is being developed under the SRU 'umbrella', there is no need to implement SRU. It would be perfectly feasible to implement Record Update in order to maintain a database served only via OAI, or only via a proprietary HTML interface. To contrast OAI and record update, OAI is a pull mechanism to update databases and is used generally for scheduled batch processing, while record update is a push mechanism intended for more interactive use.
Most simple update mechanisms available on the web have been designed for document updating with the assumption that a client posts a document to one or more databases or makes a document available for harvesting, and it is the client (source) that generally controls all maintenance of the document. In contrast, the process of updating metadata in a centralised database has characteristics that argue for a specific updating mechanism. These characteristics include changes that may occur to the metadata on entry to another database. The system of the target database will typically try to match the metadata with a record already on file and then will merge the data, thus an insert is actually changed into a merge. Often the system uses a profile for merging that allows some incoming fields to be rejected if a similar field already exists and allows some to overwrite existing fields. Some fields will be changed on entry, for example by replacing authors and subjects with preferred forms from authority records. Some fields may be automatically generated such as provenance or classification. The receiving system will also typically validate incoming data, sometimes rejecting the whole record or just the invalid parts of it. These last three cases, authority control, enrichment and validation, can occur to records being inserted that remain unmatched. As no insert command is taken straight on “face value”, there is a special requirement for keeping records in two separate systems aligned. The only sure way to guarantee that a record on one database is aligned and stays aligned with another is by exchanging record identifiers. In addition, these identifiers allow a real time access into the database for enriched content.
As a consequence, SRU update includes extensive diagnostics that are used to inform a submitting client of any changes that have been made on entry to another database and to convey the record identifier of the record on the target database. It is envisaged that SRU update will be primarily employed in an interactive situation where immediate response to diagnostics is possible. Alternatively SRU may be a real time background task with a receiving program capable of processing identifiers and diagnostics in update responses.
There are two operations in Record Update, the Update operation and the Explain operation. Update (below) contains a parameter which specifies the type of update action to perform (create, replace, or delete), and may be extended by profiles. Explain requests a service description record for the Update operation. The remainder of this document describes the Update operation.
Messages use elements from three different namespaces. WSDL and Schema files are available for them.
The version parameter on both request and response has the same semantics as the version parameter from SRU.
The action parameter determines what action the server should take with the information provided. Actions defined by the base profile and their semantics are:
Further profiles may at their discretion create new identifiers for either new actions or extensions of the above actions with additional processing requirements. (This is the reason why URIs are used to identify the action, rather than just enumerating them by integer.)
So it is important to note that the URI 'info:srw/action/1/create' (for example) refers to the "create" action defined in this document. Another authority could create its own create action with different semantics. For example, the 'info' authority 'info:srw/action/2' could define a create action with different semantics, with (for example) identifier 'info:srw/action/2/create' and the two create actions would be distinguished because their URIs are different.
The status of the operation is returned in this field. Defined values are:
A record identifier is a unique way to distinguish a record within the current context. The string may be any means to determine identity of the record, including but not limited to identifier strings, references to a result set and a position within it or a query which will evaluate to a single record. The recommended solution is an identifier string.
Some servers may also support identifying records by sending them in the record parameter. If the recordIdentifier parameter is present, then the record parameter must not be used in this way.
The server may create a result set with the record and return it as an identifier. If so, the result set should last for a reasonable amount of time, depending on the context, to allow further references to it.
Note: The recordIdentifier parameter in the response is provided for the convenience of those using RecordUpdate with the SRU 1.1 record structure. When used with the SRU 1.2 record structure it is recommended to omit this parameter (because the SRU 1.2 record structure includes a record identifier).
RecordVersions is version information concerning the record. This is a means of tracking changes to a single record such that it maintains a persistent identifier throughout its existence, but the changes can still be tracked and referenced. A server may require that the most recent version of the record be supplied in a request to ensure that the operation is taking place on the most recent copy of the record.
The information is in the form of a list. Each recordVersion entry in the list is a pair consisting of a type and a value. Each type must be unique within the list. All entries must pertain to the same version. For example, if a checksum and a versionNumber are supplied, then the checksum must be that for the given versionNumber.
The record structure contains the actual record data to be used as part of the operation. It has the same structure and semantics as the SRU record structure.
Profiles and actions may require the presence of the record in either request or response for different actions.
Update also allows for a third type of recordPacking: 'url'. If the value of recordPacking in a request is URL, then the value of recordData is a string containing a URL to the record to be operated on. The expected use for this is to allow for clients to send a reference to a large record, possibly on an alternate site, and for the server to collect it at its leisure.
RecordIdentifier may be present, but must not be a resultSetReference. If present, it is a request that the server use the given identifier for the new record.
RecordVersions may be present, but may only be a 'number'. If present, it is a request that the server use the given version number for the first version of the new record.
Record may be present. If present, it must contain the record to be created. If not present, it is a request for the server to create an empty placeholder record and return a reference to it for later editing.
RecordIdentifier may be present. When the SRU 1.1 record structure is used, it is recommended that recordIdentifier be present, either as a value or a result set reference. When the SRU 1.1 record structure is not used, it is recommended that recordIdentifier not be present.
RecordVersions may be present.
Record may be present, and it is recommended that it be present if the
server has transformed the record in any way.
RecordIdentifier must be present. It identifies the record to be replaced.
RecordVersions may be present. If present, they further identify the record to be replaced.
Record must be present.
Identifiers are crucial for record replace actions and sometimes it is necessary to disambiguate a replace request using the edit replace structure. A metadata record describing a resource on one database may differ substantially from a record for the same resource on another database; when a record replace is sent, it is desirable to indicate unambiguously the nature of the changes being made. This is important so that parts of the record are not inadvertently deleted because they did not appear in the replacement record. Therefore SRU update includes an edit replace structure that can be used in extraRequestData of the request for unambiguously stating the intentions of a request.
In the absence of a dataIdentifier or where the dataIdentifier is ambiguous, the default is to change or delete all occurrences of the specified data which match the specified old value.
In the second field 650, replace the first occurrence of subfield z to read “Vanuatu” where it was “New Hebrides”
In the holdings section of this record, add the institution symbol “TU”
In the holdings section of this record, delete the institution symbol “TU”
RecordIdentifier may be present. When the SRU 1.1 record structure is used, it is recommended that recordIdentifier be present; it identifies the new record. When the SRU 1.1 record structure is not used, it is recommended that recordIdentifier not be present.
RecordVersions may be present. If present, it identifies the new record.
Record may be present. It is recommended that it be present if the server has transformed the received record in any way.
The Delete action should only be used when deleting an entire record. When deleting a field, section or defined part of a record, the record replace action should be used, using as necessary the edit replace structure in extraRequestData.
It is recommended that recordIdentifier be present. If present, it identifies the record to be deleted. If it is not present, then Record must be present.
RecordVersions may be present. If present, they further identify the record to be deleted.
Record and recordIdentifier may both be present for redundancy. However, the server is not obliged to cross-check, that is, it may ignore one or the other, but if so should supply a diagnostic. See diagnostics 63 and 64.
RecordIdentifier may be present. If present, it identifies the deleted record. When the SRU 1.1 record structure is not used, and if Record is present, it is recommended that recordIdentifier not be present.
RecordVersions may be present. If present, they further identify the deleted record.
Record may be present. If present, it is the record which was deleted.
Although authentication and authorisation are an important aspect of practically every update system, the messages themselves do not carry any such information in the base profile. This is for two reasons:
1. The protocol cannot predict all of the business logic requirements
that an authentication system might require
It is recommended that SRU's authenticationToken system be used if there are no other requirements. This system uses a token to identify the user, but does not specify how that token is initially obtained.
Although Update does not specify a required record schema, nor does it assume that the records are maintained in any particular schema, records must be sent somehow. The schemas which the system will accept are recorded in the service description record, which is sent in the ExplainResponse message. The server will be prepared to accept any record which will validate against the schemas listed. If there is a preferred schema, it will be noted as the default schema in the configInfo section.
If a server accepts more than one record schema, the server will either transform the record into a native schema or save it as sent. Schemas other than those listed in the service description may be stored as sent or rejected.
The protocol does not make any attempt to dictate how the situation of multiple people editing the same record is to be handled by the server, as different usage scenarios will require different solutions.
Recommended solutions include:
While it is generally expected that the UpdateResponse message will be sent after all processing has been completed, this may not be feasible for large databases. Either the change to the record or any subsequent processing may be delayed such that the response cannot say whether the operation was a success or a failure, as it hasn't yet been completed.
In this case, the server should return an operationStatus of 'delayed'. The protocol does not specify a mechanism to identify the operation in a future request to discover if it has been completed or not, but solutions might include:
The diagnostics below are defined for use with the namespace info:srw/diagnostic/12
The number in the first column identifies the specific diagnostic within this namespace. Thus for example diagnostic 2 below (in "Msg id" column): "Invalid component: component rejected", is identified by the uri: info:srw/diagnostic/12/2.
August 18, 2008