Background:
A metadata application profile (MAP) is a set of recorded decisions about a shared data target for a given community. MAPs declare what models are employed (what types of entities will be described and how they relate to each other), what controlled vocabularies are used, the cardinality of fields/properties (what fields are required and which fields have a cap on the number of times they can be used), data types for string values, and guiding text/scope notes for consistent use of fields/properties. A MAP may be a multipart specification, with human-readable and machine-readable aspects, sometimes in a single file, sometimes in multiple files (e.g., a human-readable file that may include input rules, a machine-readable vocabulary, and a validation schema).
Although MAPs do not necessarily have to be machine-actionable (e.g., https://pro.dp.la/hubs/metadata-application-profile), there are benefits to generating these decisions in a machine-parseable way:
- MAPs are more explicit when machine-actionable, providing less room for interpretation--no matter the platform used to create the data, the output should be interoperable. If the syntax of the MAP (e.g., SHACL, ShEx, YAML, JSON Schema, etc.) is supported by tooling it can possibly be reused in web form configurations or as a validation target.
- Machine-actionable MAPs can be used to generate human-readable documentation for catalogers, but it is unlikely human-readable MAPs can easily produce machine-actionable files.
Presently, the most active PCC-related areas working on MAP-like outputs are the PCC Standing Committee on Standards with its policy statements for MARC/RDA and its BSR/CSR metadata application profiles, and the LD4P2 project’s BIBFRAME profile work based on the Library of Congress BIBFRAME pilots. The policy statements are human-readable in nature and the BIBFRAME profile work can be characterized as experimental and mixes “pure” application profiles about data targets with specific software behavior (BIBFRAME Editor) specifications. Meanwhile, there are unanswered questions about reflecting RDA-related decisions in PCC documentation (e.g., where will these profiles be stored, which RDA recording method will be encouraged for a particular situation, and how will we state when PCC decides to diverge from RDA or has extended practice beyond what RDA provisions for). More generally, the PCC has the challenge of creating a metadata application profile that simultaneously serves our current MARC workflows, enables conversion of legacy data to RDF, and defines a linked data native target.
It is safe to assume the PCC may not have a clear grasp of MAPs for some time and it would be unfair to task an existing group with the responsibility of maintaining PCC profiles without first understanding the problem space. To that end, this task group is charged with helping PCC understand issues and practices associated with the management of MAPs. This group will also help develop the expertise needed within PCC to work with MAPs. To be clear, this task group is not meant to be a profile creation or maintenance group but, rather, to outline the purpose of profiles and propose next steps for the PCC to take in order to prepare itself to create and maintain profiles that the community can implement.
Charge / Workplan:
Reporting to the PCC Policy Committee (PoCo), the PCC Task Group on Metadata Application Profiles is charged to:
- Define MAPs in the PCC context
- Define use cases for profiles (human readable documentation, validation, support related tooling, etc.)
- With PCC Steering and PoCo,
- Identify the expertise already in PCC and any new experience/skills needed
- Define base-model assumptions (BIBFRAME, RDA, etc.)
- Document relationships with stakeholders, LC in particular and their needs (N.B., LC involvement in profile decisions and management is critical to shared practice in PCC)
- Account for the need to address both MARC and linked data workflows
- Explore plausible maintenance/governance models
- Perform an environmental scan of current work in this space, and identify “prior art”
- Identify viable options (JSON schema, SHACL, ShEx, YAML, etc.) for recording our profile decisions, with strengths and weaknesses.
- Practically speaking, machine-actionable approaches for MAPs haven’t matured enough to suggest that there is a clear “frontrunner” to implement at this time. Nevertheless, this shouldn’t keep the PCC from committing to a strategy going forward based on current realities; we can convert many of our profile decisions from one syntax and schema to another later, if needed. While it’s tempting to focus on a specific tool’s needs, the PCC should attempt to decouple the data targets (i.e. validation, documentation) from configurations for specific tools, because tool-agnostic MAPs are more easily reusable by others.
- W3C resources
- Dataset Exchange Working Group main page: https://www.w3.org/2017/dxwg/wiki/Main_Page
- Profile Guidance:https://w3c.github.io/dxwg/profiles/
- Samvera’s exploration of YAML https://github.com/samvera-labs/houndstooth/
- Dublin Core Application Profiles http://www.dublincore.org/specifications/dublin-core/profile-guidelines/ (this was the basis for the BIBFRAME profiles)
- Determine what shareable application profiles means in the PCC context (i.e. what is the appropriate level of agreement among PCC Libraries?)
- Collaborate with the LD4P2 profiles groups, as appropriate
- Monitor ongoing LD4P2 PCC Cohort discussions (via the two email discussion lists) to track questions and concerns that may have an impact on PCC MAPs
- Recommend actions to PoCo for a plan to create and maintain profiles that meet stated use cases for application profiles
- What skills do we need to develop?
- What is a plausible maintenance model?
- Who should be represented? (long-term home, some combination of LC, PCC Standing Committee on Standards, PCC Linked Data Advisory Committee, and PCC Standing Committee on Applications?)
- How are decisions made?
Communication:
The group will communicate regularly with PoCo via its monthly calls. PoCo will provide periodic updates on the group’s work to the broader PCC community.
Reports:
Final Report (May 2020) PDF; 307 KB
Report to PoCo October 15, 2019 PDF; 93KB
Timeline:
The group will begin its work in May 2019 and will continue working until the charge above is complete. The group should provide a written report to PoCo by the middle of October each year.
Membership:
Chiat Naun Chew (Harvard)
Karen Coyle (W3C, Liaison to the DCMI profiles group)
Corine Deliot (Consultant, British Library)
Nancy Fallgren (Consultant, National Library of Medicine)
Steven Folsom (Cornell)
MJ Han (Co-chair, University of Illinois Urbana-Champaign)
Nancy Lorimer (Stanford)
Lucas Mak (Michigan State)
Honor Moody (Harvard)
Heather Pretty (Memorial University of Newfoundland, PoCo liaison)
Jackie Shieh (Co-chair, Smithsonian)
Cynthia Whitacre (OCLC)
Jodi Williamschen (Library of Congress)
Membership will be one year with option to renew.