Comments by Paul Weiss

Final version

As Sally has clearly pointed out, MARC as it is currently constructed can be used to share bibliographic information (aka metadata) about networked resources. It has been used for that purpose for a few years now. Some content designation has been added specifically for use with networked resources, most notably the 856 field. Additional adjustments can be made that will make it even more useful as we move forward. Some of these are relatively minor, adding a field here or a subfield there, and some are more major, such as embedding MARC structure in XML. However, through all these changes what remains constant is the knowledge and experience that we have gained over the years as to what is important in sharing creating, sharing, and using metadata. This is our true intellectual capital, which, I believe, is even more valuable than the actual data in our millions of records.

I believe that one of the most important points in Sally's paper and presentation is that there are multiple aspects that make up MARC, which she identifies as content, structure, and markup. Many have viewed MARC as simply markup, but Sally shows that MARC is in fact far richer than that. Indeed many other information organization tools that librarianship has developed over the years-AACR2, LCSH, DDC, etc.-have similar multiple aspects in their makeup. Our intimate expertise with these various aspects is at the root of our intellectual capital.

So what is this intellectual capital? We can start with the fact that information resources and metadata that describe them are far more complex and unruly than most people outside of our profession can even guess at. "How complicated can it be to figure out what the title of a book is?" (Well, let me show you any number of conference proceedings, agricultural research station monographs, looseleaf services, or European Union publications.) What else have we learned in our over 30 years of experience with the MARC formats and other standards that will help people identify, search for, and use networked resources on the Web?

Standards

There are too many resources physically in our libraries and now out on the Web for any one institution to create the metadata for all of it. So one of us creates metadata for a resource and we generally share it with the rest of the library community through bibliographic utilities, or by making our catalog accessible on the Web. Since we are sharing our data, and since we want to be able to use various automated tools beyond those we develop ourselves, we have created standards. These help ensure that we can read each other's data, and that system vendors have a large enough base of prospective customers to make it worth investing resources in developing systems that manipulate that data. Interoperability and a need to not reinvent the wheel dictate that we use already existing standards when feasible; the MARC formats refer to several external standards.

It can be helpful to have metastandards. ISO 2709 and Z39.2 are metastandards that provide the general structure for interchangeable records. The Backgrounds and Principles document and the Record Structure and Character Sets sections of the Specifications document form the next level of standard down. They distill the features that are common among all the MARC formats. The specific MARC formats then take that general structure and flesh out specifics for different kinds of data. Granted, in this case the standards were developed in reverse order, but faceting out the levels of structure is still valuable. In the larger Web community, SGML and XML were developed as metastandards. There is growing realization that the Dublin Core is de facto a metastandard rather than directly a standard, as many implementations of it add structure. Especially early on in the development of standards in a particular area, the creation of a metastandard for what can be agreed to by everyone allows experimentation with specific standards for specific projects. Lessons learned in these early implementations can be incorporated into a more general standard.

Sometimes it is useful to have multiple standards for the same issue; different communities often have different needs. Library of Congress Subject Headings are used by many libraries in the US, while sizable numbers use either Sears or MeSH instead. In all cases, providing a sound subject retrieval system is the goal. Many school libraries receive MARC records on diskette, while many academic libraries use ftp to move sets of records around, there being a different standard for each exchange medium.

At the same time we have learned that standards should only be developed and followed when the benefit of adhering to the standard is more than the cost of not doing so. Sometimes there is little value in standardizing an aspect of metadata. We do not require any particular structure in the content or markup of data in a General note (500); there has been no strong need articulated for doing that. Sometimes there is a value, but the cost is too high. Such value judgements may be made as a profession, as in not providing chapter-level subject access, or locally, such as which series to analyze. There are cases in which one community may find it worth standardizing and another not. The school library community finds specific information about audience level very valuable, so the 521 field was given additional structure to accommodate their needs. Meanwhile, most academic libraries, if they record this information at all, record it rather free-text. And even when standardization would be considered valuable to a user community, it may not be considered as such to the community that would need to apply that standard. Witness the situation between librarians and publishers with regards to standardizing the title of a resource in all places on and in a resource where it appears.

Experience has shown us that changes to standards need to be treated in a controlled and explicit way. The "obsolete" concept in MARC has proven to be quite valuable. Documenting changes to the format is crucial for database managers to fully understand their data. Consensus in the profession on how and when to implement changes (AACR1 to AACR2, format integration, Wade-Giles to Pinyin, even the addition of a new source code) has kept the use of our standards standard.

Resources

We have heard many times how different networked resources are from books, etc. It is important to be able to distinguish what is truly new and different with networked resources, and therefore may need new solutions, from what has precedent in the pre-networked world, and may be amenable to existing solutions. For example, the fact that networked resources often change frequently with little explicit notice given has a parallel in looseleaf material. Some of the ways we treat looseleafs may work with networked resources. If nothing else, our experience with looseleafs has taught us that there are some aspects of these variations that are more salient in identifying a resource than others. This general idea can be helpful in discussions with nonlibrarians. On the other hand, the quantity of new, thus far undescribed resources now available to our users is of such a larger order of magnitude than any of our historical backlogs, that the issue of scalability is essentially new for us.

Metadata

We know that it is important not just to have data that describes a resource (metadata), but also data about that metadata ("metametadata" perhaps). Leader byte 17 (Encoding level) describes the fullness of the record or, to some extent, our confidence in the data. Certainly this has been a useful concept for us in the past, and would be quite valuable to know about metadata for a networked resource. Another kind of "metametadata" is data about sets of metadata. The electronic file label structure delineated in the Exchange Media section of the Specifications document allows one machine to understand what it is getting from another machine. We also use data which actually describes the relationship between two versions of metadata for the same resource. Leader byte 5 (Record status) tells a system whether this is a new record, a better record, or a record to be deleted. Extremely simple and obvious concept to us, but not necessarily to other communities.

Our experience with MARC reminds us that if data is faceted out and marked up well once, it can be utilized (displayed, processed, searched on) in multiple ways, with the underlying structure of the data often transparent to a particular user. In the acquisitions arena, for example, a library staffer may input bibliographic, order, and checkin data about a new serial on one template in her library system. The system then takes that data and organizes it into three distinct but linked records. When a patron searches for that serial in the OPAC, the system brings together various pieces of data from each of those three records to provide a meaningful display to the patron.

Authority control, as many others have pointed, is one of the most important areas of expertise that librarians can share. Although much of the authority control we use in libraries is handled by other standards, MARC has its own as well, in the several code lists for languages, organizations, sources, etc. Indeed, when to have data communicated in a coded form should be thought through.

Other aspects of metadata that we have found useful to standardize include:

Next steps

So we have all this intellectual capital to share with the larger Web community. How do we go about doing that? Getting active in W3C and other organizations is certainly important. Over the years, we have made some attempts to be heard in other communities, but usually without much success. I believe that some of this is due to the sociology and psychology of our profession. We may be proud of what we do, but we only express that well to ourselves. We need to gain enough self-confidence as a profession to be able to express the value of our expertise to others. We also need to learn their lingo. Using library-specific terminology without explanation and explicit relationship to something in the other community's world will not get us very far.

Summary

Here is a summary of the points discussed above that I think we as librarians involved in bibliographic data can bring to the table when discussing access to resources on the Web with members of other communities. Perhaps one of the most important ideas we can bring is that some of the following at first glance seem contradictory, but each has its flavor of truth. We can help achieve the appropriate balance among the implications of each of these ideas to bring about an information world optimized for success.


Library of Congress
December 21, 2000
Library of Congress Help Desk