Library of Congress Guidelines for HTML 4.01
[ HOME ] [ Introduction ] [ HTML Overview ] [ HTML 4.01 Tags ] [ Style Sheets ] [ Links ]

Introduction

In the Library of Congress, the creation of home pages for the World Wide Web (WWW) is a distributed effort involving many staff members (Library Webmasters) from divisions and offices throughout the Library. The Service Unit managers are responsible for approving all "content" prior to dissemination over the Internet, including Web pages (see Internet Policies of the Library of Congress). Library Webmasters are responsible for following the HTML guidelines and practices as set forth in this document. The Network Development and MARC Standards Office of the Library of Congress is responsible for maintaining and widely distributing these guidelines to all Library Webmasters, selecting and licensing (through ITS) appropriate HTML editors, and teaching classes in the use of HTML to all staff of the Library.

With the availability of this "third edition" of the Library of Congress Guidelines for Hypertext Markup Language, NDMSO is recommending the use of HTML 4.01 along with Cascading Style Sheets. NDMSO will continue to closely follow Web browser and HTML/SGML developments so that as new specifications are completed and as browsers improve, Library-wide standards can be enhanced to take advantage of new features. NDMSO is also responsible for the maintenance of MARC; Z39.50 and related profiles; and, Standard Generalized Markup Language (SGML) Document Type Definitions (DTDs) which are being developed with outside participants for MARC records and Encoded Archival Description (formerly the "finding aids" DTD).

Why Standards Are Critical

Library Webmasters will continue to create Web pages for the public and internal Web sites. However, with just a rudimentary knowledge of Hypertext Markup Language (HTML), it is possible to create a Web page that displays acceptably under the browser the author is using (a "browser" is a WWW client, such as Netscape, Internet Explorer or Lynx). However, when HTML coding does not conform to accepted standards, Web pages may be rendered unsightly or even unreadable under other browsers or other versions of the same browser. Only conformant HTML coding will be rendered acceptably to all browsers, thereby presenting usable Web documents to the entire community of users.

In order for Library of Congress Web projects to form a coherent Web site, it is also critical that all staff follow a standard style guide as well as adhere to one HTML standard specification. The Library of Congress World Wide Web Style Guide should be used in conjunction with this HTML guide when creating Web pages for the Library of Congress:

http://www.loc.gov/loc/webstyle/

The HTML 4.01 Reference Specification

The HTML Reference Specification, version 4.01, is maintained by the World Wide Web Consortium (W3C) and is available at:

http://www.w3.org/TR/html40/

The original HTML specification was written by Tim Berners-Lee, now director of W3C, while he was at CERN (European Partical Physics Laboratory). Innovations from The National Center for Supercomputing Applications (NCSA) and other contributors were reviewed under the auspices of the Internet Engineering Task Force (IETF), and published as the HTML 2.0 specification (last updated on September 21, 1995 ), RFC 1866, edited by Dan Connolly. The next version, HTML 3.2 was finalized on January 14, 1997; this time developed in conjunction with commercial organizations including IBM, Microsoft, Netscape, Novell, SoftQuad, Spyglass, and Sun Microsystems. The most current version of the standard, HTML 4.01, was approved by the current W3C membership (see http://www.w3.org/Consortium/Member/List/) in December 1999.

A statement of the current HTML activity is available from W3C at:
http://www.w3.org/MarkUp/

SGML and HTML: Related Markup Languages

What is a "Markup Language"?

A markup language is a series of predefined codes or tags that are used to define the structural elements of a document. The tags generally surround a word or series of words on the page, "marking" them. The markup language is not intended to dictate a presentation of the file on paper or on a computer screen. That presentation is left to the user interface (Web browser) and accompanying style sheets.

Standard Generalized Markup Language (SGML)

SGML is an international standard (ISO 8879: 1986) for encoding textual information. It provides a method (i.e., a language) for describing the structure of the information by applying markup to the text.

Subsets of SGML for specific documents or document types are defined in a Document Type Definition (DTD). DTDs have been written for memos, articles, books, technical documentation, historical documents, finding aids for archival materials, and a wide range of other document types.

In most SGML applications, DTDs are written in such a way as to leave the formatting (stylistic appearance) of the text to the application that processes the SGML documents. The codes and definitions describe the hierarchical structure of a document as well as some "value-added" content tagging. Therefore, elements of the text can be marked up based on their meaning within the text (e.g., author, part name, footnote, date, note, etc.).

The HTML DTD

Hypertext Markup Language is actually a special DTD of SGML. It is used specifically to display information on the World Wide Web. The HTML DTD has always combined formatting (presentation) specifications along with structural markup tags; although with HTML 4.01 and Cascading Style Sheets, begins the separation of "structure" from "appearance."

SGML DTDs at the Library

Several projects in the Library use DTDs so that important content in the text can be captured in a marked up document. The American Memory DTD for historical documents and the Encoded Archival Description (EAD) DTD for finding aids are examples of these. It is possible to view SGML documents over the Web using special plug-in or helper applications. In fact, the Library makes some American Memory and EAD documents available this way.

Converting SGML to HTML

In some cases, it is not practical to only provide an SGML version of a document. Since HTML is a DTD of SGML, it is not difficult to move a document from SGML to HTML. Conversion utilities or word processor macros can be used to convert the SGML documents to many other formats, including HTML.

More Information on SGML

Information about SGML can be found at the following Web sites:

Uniform Resource Identifiers (URIs)

A Uniform Resource Identifier (URI) is the generic scheme for identifying a resource, most often on the Internet. URI is now the name that the W3C uses to identify addresses used for hypertext links in HTML-coded documents. At this point in time, the URI continues to use the specific format of the URL; however, in most cases, URLs are now referred to as URIs.

The standard addressing structure used on the World Wide Web is the Uniform Resource Locator. It is a standard Internet addressing scheme that allows computer programs to interpret the server address and use the appropriate Internet protocol (e.g., FTP [file transfer protocol], TELNET [telnet protocol], TN3270 [telnet 3270 protocol], HTTP [hypertext transfer protocol], NNTP [network news transfer protocol]). Absolute URLs contain the full addressing scheme: the Protocol Type (HTTP, TELNET, etc.), the Full Server Address (the computer's name) and Port (when required), and the Path to the file or directory on the server. Between the Protocol and the Server Name in the URL is a colon and two forward slashes (http://www.loc.gov/). Between the machine name and the path is one forward slash (http://www.loc.gov/global/).

* Note - Some servers run protocols on nonstandard ports; if so, the alternate Port number, preceded by a colon, follows the Server Name (e.g., http://www.loc.gov:8081/test/...). Also, the use of the final slash in URLs that don't end in a file name is highly recommended because some browsers will misinterpret the URL without the final slash:

http://www.loc.gov/global/ is an example of a URL ending in a directory name;
http://www.loc.gov/global/explore.html is an example of a URL ending in a filename.
Other examples of URLs:

* A path is not needed in the telnet protocol, as an interactive logon session is initiated.

Occasionally, a reference to a URL has only a protocol type and server address, and no path (e.g., http://www.loc.gov/). Depending on the configuration of the Web server, either an index of files in the server's root directory will be displayed, or an HTML document named index.html will be retrieved. This file will not be an index per se, but a default home page that is delivered whenever the full path is not specified.

Netscape and Internet Explorer Proprietary Tags

Although Netscape Navigator and Microsoft Internet Explorer are the most popular of the Web browsers on the market today, they have added many features and extensions to the HTML tag set that are not universally supported by all browsers. It is not advisable to "optimize" your HTML-encoded documents for viewing on Netscape or any other browser. As long as you adhere to the current specifications for HTML, you will be assured that all browsers can access and read the files you create. Testing on multiple browsers is always advised.

HTML Editors

In order to create HTML documents with some efficiency, is helpful to use special-purpose software for HTML creation called an "HTML editor." Most of these software programs are designed to create new text or open existing ASCII text documents, then apply HTML codes to that text using function buttons and menu selections. For the past several years, the Library has licensed HTML Assistant Pro for use in creating documents for in HTML. However, to better support the new HTML standard and the increasing complexity of HTML coding, the Network Development and MARC Standards Office has now recommended that the Library begin using a pairing of HTML editors called Dreamweaver and Homesite. The new editors are more robust that HTML Assistant, and offer new features such as WYSIWYG editing, site management, integrated HTML validation, templating, and many other features. Most importantly, they produce generally clean HTML code, which has not been true for other WYSIWYG editors and convertors.

HTML documents can also be created using standard wordprocessing programs (using macros), or simple text editors like vi, Windows Notepad, etc. (by manually typing in the codes with the text).

HTML Validation

As part of the process of creating HTML documents for the Library of Congress, you will need to validate the HTML you create. Our on-site Weblint Gateway validates at the HTML 3.2 level only:

http://www.loc.gov/cgi-bin/weblint-gw/

Using a fill-in form, supply the URL for your HTML file, then select the button that runs the validator. Alternatively, select the option to Validate Using Local File Names to test files that have not yet been loaded onto a production or test server.

In order to validate HTML 4.01 documents and Cascading Style Sheets, other validation services will need to be used. NDMSO currently recommends that you use the following off-site resources until Weblint has been upgraded to HTML 4:

http://validator.w3.org/ - W3C Web Validation Service
http://jigsaw.w3.org/css-validator/ - W3C CSS Validation Service

HTML Training and Style Guides

HTML training is sponsored by the Library's Internet Operations Group via the Webmaster Technical Training Curriculum and currently administered via the Library of Congress Internal University. For the next available courses, see:

http://www.loc.gov/staff/lciu/

Another fine tutorial for HTML is available on the Web:

A Beginners Guide to HTML
http://archive.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimer.html

Stylistic conventions and standards for the Library of Congress, when creating information for the World Wide Web, are presented in the Library of Congress World Wide Web Style Guide. The Congressional Research Service (CRS) also maintains its own World Wide Web Standards, Guidelines, and Policies document which is available through the CRS Staff Page.


[ HOME ] [ Introduction ] [ HTML Overview ] [ HTML 4.01 Tags ] [ Style Sheets ] [ Links ]
[ Library of Congress Standards ] [ Library of Congress Home Page ]

Library of Congress
Library of Congress Help Desk ( December 20, 2002 )

Maintained by the Network Development and MARC Standards Office
Links to detailed documentation on Web Design Group site are provided.