Draft File Content Element List
Date: April 19, 2004
Assumptions for this project
- Librarian can pull up info from a catalog, such as WorldCat
or other tool.
- A bibliographic record is optional - users may not be seeing
this data from within a bibliographic record.
Data Elements
Item identifiers - Required
- Standard numbers such as ISBN, ISSN, LCCN, OCLC record control
number, RLG record control number
- Identifier indicator (so we know what to match on)
- Full title (from item, or fielded as in bib records)
Data provider - Required
- Who created this stuff (use for trust, corrections)
Restrictions on use - Optional
Table of Contents - Optional
Type indicator (so we know what to expect) - Required
- Page images only
- ID of the file format - Required
- Sequencing information of the images - Required
- URL or link to the image - Required if not embedded in
file
- Page images and dirty OCR
- ID of the file format - Required
- Sequencing information of the images - Required
- Link between dirty OCR and page images - Required
- URL or link to the image - Required if not embedded in
file
- Accurate, encoded text
- URL or link to the image - Required if not embedded in
file
- ID of the file format - Required
- We assume that the file format then drives further
required and optional elements
- What we want, i.e., don't throw away data, maintain:
- Author/title tagging
- Hierarchical structure
- Page numbers from the TOC entries (serve
to define extent of coverage, and can link
to content. Could also be used to put the contents
listing in order if hierarchy can't be used)
- Optional
Back-of-the-book Index - Optional
Type indicator, so we know what to expect - Required
- Page images only
- ID of the file format - Required
- Sequencing information of the images - Required
- URL or link to the image - Required if not embedded in
file
- Page images and dirty OCR
- ID of the file format - Required
- Sequencing information of the images - Required
- Link between dirty OCR and page images - Required
- URL or link to the image - Required if not embedded in
file
- Accurate, encoded text
- URL or link to the image - Required if not embedded in
file
- ID of the file format - Required
- We assume that the file format then drives further
required and optional elements
- What we want, i.e., don't throw away data, maintain:
- Author/title tagging
- Hierarchical structure
- Page numbers from the back-of-the-book entries
(serve to define extent of coverage, and can
link to content.) - Optional
Cover art - Optional
- ID of the file format - Required
- URL or link to the image - Required if not embedded in file
- Dimensions - Optional (but highly recommended!)
Reviews - Optional
- Review itself - Required (worst case is a text blob)
- Fullness of review (i.e., full review, review quote) - Optional
- Author - Optional
- Source - citation (e.g., journal article) - Optional
- Indicator of source of review (e.g., journal, librarian-supplied,
user-supplied) - Optional
- Length - Optional
- URL or link to the review - Required if not embedded in file
- Positive, negative, neutral? - Very optional
Item description (e.g., summary or abstract) - Optional
- Description itself - Required (worst case is a text blob)
- Type (e.g., ONIX short or long) - Optional
- Indicator of source of description (e.g., publisher, author-supplied,
librarian-supplied, user-supplied) - Optional
- URL or link to the description - Required if not embedded in
file
Subject terms (e.g., keywords, LCSH) - Optional
- One or more subject terms - Required
- ID type of keywords or controlled vocabulary - Required
- Indicator of the source of terms (e.g., publisher, author-supplied,
librarian-supplied, user-supplied, other) - Optional
Excerpts - Optional
- Excerpt itself - Required (worst case is a text blob)
- Description of Excerpt (e.g., what chapter?) - Required
- Indicator of source of description (e.g., publisher, author-supplied,
librarian-supplied, user-supplied) - Optional
- URL or link to the description - Required if not embedded in
file
|