HyperText Markup Language
 [Previous]  [Next]

The Basics


What is HyperText/HyperMedia?

HyperText is the process of making certain words or phrases in a text file link a user to another text file. These files may be located on the same internal server, or anywhere on the Internet. HyperMedia is the term used for hypertext files which contain multimedia elements like graphics, video and audio. Hypertext files for the World Wide Web are encoded in HyperText Markup Language. A HyperText Glossary [http://www.w3.org/hypertext/WWW/Terms.html] is available from the World Wide Web Consortium.

What is a "Markup Language"?

A markup language is a series of predefined codes or tags that are used to define the structural elements of a page of text. The tags generally surround a word or series of words on the page, "marking" them. The markup language is not intended to dictate a representation of the file on paper or on a computer screen. That representation is left to the user interface or presentation program.

HTML and SGML Are Related

HyperText Markup Language (HTML) [http://www.w3.org/MarkUp/] is a subset of the Standard Generalized Markup Language (SGML). There are several efforts inside the Library to use SGML for finding aids, MARC records, and American Memory. However, HTML is currently the standard markup used for text files presented over the World Wide Web. For more information on SGML, please consult the following documents:

XML, XHTML and HTML

Extensible Markup Language (XML) is a subset of SGML. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.

Extensible HyperText Markup Language (XHTML) is a reformulation of HTML 4 as an XML 1.0 application.

Where is all this going? Who knows... but you can follow the standards development process through the World Wide Web Consortium Web Site [http://www.w3.org/].

Standards and Specifications

SGML is ISO (International Standards Organization) standard number 8879:1986. The HTML specification, called a Document Type Definition (DTD), is maintained by the World Wide Web Consortium (W3C) [http://www.w3.org/] through the HTML Working Group of the Internet Engineering Task Force (IETF) [http://www.ietf.cnri.reston.va.us/home.html]. The current specification used by the Library of Congress is HTML 4.01. HTML 4.01 has been a W3C Recommendation (as of December 24, 1999). The Library of Congress officially adopted this version of HTML early in February of 2000. The full standard is maintained by the W3C at:

The Library of Congress maintains a guidelines document for HTML 4.01 usage at:

Files and Pages

When you write HTML, you will be creating new files or editing existing files. These files will be created locally (in most cases), then transferred onto the RS7 server. Once they are loaded onto the RS7, they can be referred to using a Uniform Resources Identifier (URI). When those URIs are entered into a Web browser like Netscape or Internet Explorer, the files -- with their embedded HTML -- are displayed as Web pages.

The main Web page for a topic, function, or institution is normally called the Home Page. Web pages are often more than one screen long; however, it is recommended that in most cases

What are URIs?

A Uniform Resource Identifier (URI) is the generic scheme for identifying a resource, most often on the Internet. URI is now the name that the W3C uses to identify addresses used for hypertext links in HTML-coded documents. At this point in time, the URI continues to use the specific format of the Uniform Resrouce Locator (URL); however, in most documentation, URLs are now generically referred to as URIs.

The standard addressing structure used on the World Wide Web continues to be the URL. It is a standard Internet addressing scheme that allows computer programs to transfer information over the Internet using the appropriate protocol/language (e.g., FTP [file transfer protocol], TELNET [telnet protocol], TN3270 [telnet 3270 protocol], HTTP [hypertext transfer protocol], NNTP [network news transfer protocol]).

An absolute URL contains the full addressing scheme: the Protocol Type (HTTP, TELNET, etc.), the Full Server Address (the computer's name) and Port (when required), and the Path to the file or directory on the server.

The use of the final slash in URLs that don't end in a file name is highly recommended because some browsers will misinterpret the URL without the final slash:

The URL for the Library of Congress Home Page is: http://www.loc.gov/

The URL for the Library's Public FTP Site is: ftp://ftp.loc.gov/pub/

The URLs for LOCIS are: telnet://locis.loc.gov/ or tn3270://locis.loc.gov/

For more information on URLs consult A Guide to URLs (via Brown University) [http://netspace.students.brown.edu/users/dwb/url-guide.html] or A Beginners Guide to URLs (via NCSA) [http://archive.ncsa.uiuc.edu/demoweb/].


Go to:


Library of Congress
Library of Congress Help Desk ( December 20, 2002 )