Libraries traditionally use unique identifiers for physical items in their collections -- by assigning call numbers and sticking labels on covers. A reader who identifies a book in a catalog can retrieve it by going to the shelf and looking for the label. The call number is the "key" that links the catalog record to the item it identifies. When the item is moved, its identifier goes with it. If library shelves are re-organized, individual call numbers need not be changed, only the signs on the shelves. In conjunction with a map of the stacks, these signs provide vital access support for the reader (or, in the case of closed stacks, for the deck attendant). The map helps the reader "resolve" the call number into a physical location.
Digital resources must also be identified uniquely. Until recently, no attempt was made to provide standard names for digital resources in general, except for very limited applications or in closed systems, such as within a single database. However, a digital library built for the long-term cannot be a closed system. It must be built out of modular components that can be supplemented and upgraded as new technology is developed. As in the traditional collection, the name for an item in the digital library will be the "key" that links catalogs, compilations, and references to the item itself. Figure 1 represents the modular design for supporting access to NDL collections: the user interface with which the patron interacts, the tools for access (catalogs, free-text indexes, finding aids, etc.) that support that interface, and the archive that contains the digital collections. Any item in the digital archive is accessible using several tools. The item must have a unique name that all references to it can use.
The characteristics of digital resources pose challenges for naming. A file has no "cover" on which a label can be permanently fixed, and available to any user even if the file is copied to another computer. The only "cover" for a generic file is its filename. For digital resources that comprise a single file and have a fixed location, filenames do provide a basis for naming. Every file attached to a particular computer, whatever its operating system, must have a name, and the full name (which includes the "path" of nested directories in which the file is stored) must be unique within that computer's file system. Since every computer on the Internet (or any other computer network) must have an identifier unique across the network, the combination of computer identifier and filename provides a unique identifier for any file on the Internet.
Uniform Resource Locators (URLs), the identifiers used on the World Wide Web (WWW) today, generalize the two-part identifier (computer name, file name) by adding a third component specifying the network protocol which should be used to access the file. The addition of the protocol component to the identifier allows names to be given to resources that are not files, such as interactive terminal sessions or database query forms. Some links below point to more details about URLs. The URL approach has proved powerful and flexible for identifying Internet resources and is one of three building-blocks on which the World Wide Web is based. The World Wide Web has been a phenomenal success because each of those building blocks was simple to understand, implement, and use in the Internet environment of the early 1990s. The American Memory project took advantage of the WWW environment to provide access to its historical collections across the Internet. Each item in the collections is accessed through its URL.
However, there is a problem with URLs as long-term identifiers for digital items and resources. The URL incorporates the names of the computer and files that hold the resource. When a file or resource is moved (perhaps because a computer has failed or no longer has the capacity to handle user demand), the URL is no longer valid. Regular users of the WWW routinely come across links that lead nowhere. In some cases, old links lead to documents that apologize for the inconvenience and provide a link to the new URL, but that is hardly a reliable approach for the long term.
The WWW community recognizes the shortcoming of URLs and has developed the concept of a Uniform Resource Name (URN). A URN is valid for the long term and independent of location, while still being globally unique. Several promising schemes for implementing a system of URNs have emerged. They address the form for names, methods to guarantee global uniqueness, and the design and deployment of a distributed system that provides an efficient address lookup function to "resolve" URNs into pointers to actual locations, with capabilities for publishers/authors/librarians to manage "their" names.
At the December 1995 meeting of the Internet Engineering Task Force (IETF), the most active groups with proposals agreed to go ahead and deploy a variety of systems in a way that allows them to work together and be tested through use by the Internet community. For summaries of the state of URN standardization, see Naming conventions for Digital Resources (by Rebecca Guenther of LC's Network Development and MARC Standards Office. January 3, 1996) and Uniform Resource Names: a progress report (by URN implementors, February 1996). There are also links below to the proposals for URN schemes and related materials.
The URN proposals have some commonalities:
LC is working during 1996 with CNRI (Corporation for National Research Initiatives) on a prototype digital archive, based on the Handle System (CNRI's URN scheme) and a Repository that supports management and access control for items in the digital collections. This design can coexist with the current approach, as shown in Figure 4.
URLs -- the identifiers currently used on the World Wide Web
Proposals and implementations for URNs
In February 1996, a group of implementors of schemes for Uniform Resource Names issued a joint progress report in D-lib magazine at http://www.dlib.org/dlib/february96/02arms.html. Links to information on individual URN proposals follow:
Intro -- Index -- Glossary -- Feedback
Identifiers --
(2/19/96)