PREMIS (Preservation Metadata, Data Dictionary Maintenance Activity)
Official Web Site  

PREMIS Implementation Registry

HathiTrust shared digital repository

Activity Name: HathiTrust shared digital repository
Organization: University of Michigan
Organization Type: University
Content Type: Audio
Cartographic Material
Datasets
Images - yes
Other
Text-based Materials - yes
Video
Websites
Origin of Content (digitized, born digital,
or both):
no content origin specified
Description: HathiTrust is a partnership of major research institutions and libraries working to ensure that the cultural record is preserved and accessible long into the future. There are more than fifty partners in HathiTrust, and membership is open to institutions worldwide. The HathiTrust Digital Library brings together the immense collections of partner institutions in digital form, preserving them securely to be accessed and used today, and in future generations.
Start Date: 0000-00-00
Stakeholder/Audience The designated community is essentially “the world.”
Website: http://www.hathitrust.org
Notes: Conforms to PREMIS 2.1, constantly being edited and tweaked.
Implementation Details:
What repository workflows or functions does your PREMIS Metadata support? Our PREMIS metadata primarily serves as a record of the provenance of the digital object. Events include the initial digital capture, compression into a zip archive, calculating MD5 sums, ingesting a package into the repository, validating package components, etc. Additionally, if any changes are made to a SIP to conform it to our AIP structure, we use PREMIS to make a note of any transformations that occur, such as file name remediation, modifying image headers, and so on. Rights management is handled by a separate database. Along with the events themselves, we track associated event agents (who performed the event) and the associated hardware/software as an associated event agent, if known.
What PREMIS data module entities are represented in your implementation? We nearly entirely store Events, but are beginning to investigate Object as well to store some object-level metadata.
How is preservation metadata stored in your repository? It is stored in a METS XML file along with other information about the package.
What ancillary resources and tools do you use to support your PREMIS implementation? Controlled Vocabularies We use well-known controlled vocabularies when possible, such as MARC21 for agent names. A battle right now is knowing which controlled vocabularies exist and should be employed, and how to manage our own internal controlled vocabularies. Tools We use a number of tools in our ingest verification process, such as JHOVE, EXIFtool, and so on. Extension Schemas Registries
Is your preservation metadata created internally, imported from external sources, or both? Some of both, I believe. We attempt to get as much metadata from content providers as possible as part of the SIP. Some PREMIS events can only be created after the SIP has been received and is therefore created internally.
Is your preservation metadata used only internally or is it shared with external organizations? The METS exists as part of the AIP and is a “functional” METS meaning it is what the repository web front-end uses to display information to the user. It can also be distributed with certain types of DIPs, as-is.
Example File Name: 44_UMHathiTrust_39002066630105 mets.xml
Description of Example Files and/or Additional Example Files: