The Internet and the World Wide Web


	The Internet and the World Wide Web

A Network for the Worldwide Exchange of Information

The Internet is a worldwide network for the exchange of information. The functions served by the Internet depend on and support an infrastructure of computers and communications networks. This infrastructure has its origin in technologies, devices and practices traceable to the first electronic computers.

The origins of the Internet are rooted in efforts to share information and computational resources. Computers made possible the generation and packaging of information in digital format, and inter-computer communications facilitated the exchange and dissemination of information. Effective communication via the Internet was made possible by the development of methods for the unambiguous addressing and reliable remote delivery of digitized messages.

Research on Inter-Computer Communications

In 1958, the Department of Defense created the Advanced Research Projects Agency (ARPA) as its central research and development organization. In 1962, ARPA undertook research on the utilization and sharing of computer resources, and on inter-computer communications. The ARPA Information Processing Techniques Office (IPTO) was created to pursue an interest in having computers help people communicate with other people, in the context of scientific research and military organizations.

Joseph Carl Licklider was director of the IPTO from its beginning in 1962 to 1964, and later from 1973 to 1975. He developed the concept of a wide area computer network. Licklider envisioned networks of “thinking centers” that incorporated the functions of libraries with anticipated advances in information storage and retrieval. An IPTO objective was to eventually connect Department of Defense computers at the Pentagon, the Cheyenne Mountain Operations Center, and Strategic Air Command Headquarters. Ivan Sutherland, who was IPTO director in 1965, contracted with Lawrence Roberts, at the MIT Lincoln Laboratory, to develop a computer network, and with Thomas Marill to program the network.

Paul Baran, working at the RAND Corporation in the early 1960’s, devised a scheme that would allow U.S. command and control communications to survive a nuclear attack. Baran’s ideas involved a computer-controlled communications network that could operate after losing multiple communication paths. In 1964 he published On Distributed Communications, a document describing concepts for packet switching networks and which noted the greater reliability and survivability of distributed communications networks compared with centralized and decentralized networks.

By 1965 Donald Davies, at the British National Physical Laboratory, had developed packet switching communications concepts, where instead of pre-allocating transmission bandwidth for a session to a user, the bandwidth is dynamically allocated to multiple users. Time sharing increases utilization of communication links because data transmission typically occurs in bursts, with periods of inactivity.

ARPANET

In 1966, then IPTO director Robert Taylor hired Lawrence Roberts as IPTO Chief Scientist. Inspired by Licklider and his successors, and managed by Lawrence Roberts, the IPTO work on computer networks led to the development of ARPANET, which began to be constructed in 1969. ARPANET was the first wide area packet switching network.

In a packet switching network data is divided into packets by a computer using rules to rout them. A packet is transmitted to the next site, which forwards it to the next site, etc. If transmission fails because the receiving site is not available, transmission is switched to another site. Packets are forwarded, according to an adaptable routing doctrine, until they reach the final destination. While a user’s data transmission is paused, the communication link may be utilized to transmit packets from another user.

ARPANET used special computers, Interface Message Processors (IMP’s), to provide communications interfaces for each site. Each IMP connected to its site’s host computer, and to a modem that communicated via telephone lines to other sites via their modems and IMP’s. ARPANET’s four original sites were the University of California at Los Angeles (UCLA), Stanford Research Institute (SRI), the University of California at Santa Barbara (UCSB), and the University of Utah. Communications to and from IMP’s used the 1822 protocol. An 1822 data packet consisted of a message type, a numeric host address, and a data field. A Network Control Program (NCP) provided reliable communications links between different processes on different sites, and network utilities that could be shared by applications running on a host computer.

Document Markup and Transmission

By the early 1960’s, documents were routinely created and stored in electronic form through the use of computers. But one of the goals of ARPANET, the easy exchange of large amounts of information between computer sites, proved difficult to achieve.

One difficulty was the low rates of data transmission then available. AT&T developed the T1 four-wire 1,536 kilobits per second high-speed data link in 1962, but this technology did not become widely affordable until 1984. During the 1960’s most long-distance inter-computer communications operated at between 300 bits per second and 50 kilobits per second.

The other difficulty derived from the independent development and use of different document preparation standards, scripting programs, and printers. Available document preparation programs annotated text in a way syntactically distinguishable from the text in order to define formatting and typesetting instructions. Different programs did this in different ways, and annotations incorporated device-dependent information. Whereas a single facility could in theory standardize practices and equipment, doing so universally was impractical.

GML

In 1967, USNR Captain William W. Tunnicliffe, a member of the Graphic Communications Association, presented a paper in which he described the concept of separating document information content from the specification of format and device-dependent instructions. By 1969, Charles Goldfarb, Ed Mosher and Raymond Lorie, working at IBM, were developing a markup language, a set of markup conventions used for encoding texts, to represent data independently from its format specification. IBM introduced the Generalized Markup Language (GML) in 1973 in the Advanced Text Management System (ATMS). GML was later made part of IBM’s Document Composition Facility (DCF) text formatter.

GML is a document description language that allows defining a document in terms of its contents and organization. GML tags describe a document’s component parts and their order and associations. Tags are used to mark document components such as chapters, subchapters, paragraphs, tables, and lists, indicating relative importance through means such as heading levels.

The increased portability provided by GML facilitated the transfer of documents electronically between remote computer sites. Detailed instructions used for detailed formatting or for specifying page layout, fonts, or line spacing, were provided separately outside the document for use in processing documents for particular purposes.

Expansion and Further Development of ARPANET

The number of scientific research and military project sites using ARPANET grew rapidly. In early 1971, the network had grown to connect 15 sites. Utilities such as electronic mail (e-mail), the File Transfer Protocol (FTP), and Network Voice Protocol (NVP) were incorporated during the early 1970’s. ARPANET was designed to tolerate network losses, and as the number of nodes increased, so did its robustness and survivability. In 1975, ARPA handed over the day-to-day operation of ARPANET to the Defense Communications Agency.

In 1972, the Defense Advanced Research Projects Agency (DARPA) was involved in research on a packet radio system that was reliable and could maintain effective communication in the face of jamming and other radio interference. This radio communications work grew to involve end-to-end communications protocols and reliable inter-computer data transmissions. Since the ARPANET NCP did not have the ability to address nodes beyond the destination IMP, work on an enhanced protocol with a more open and extensible architecture was initiated. Robert Kahn led the development of the protocol that was eventually adopted, TCP/IP, the protocol used by the Internet.

By the early 1980’s, the number of ARPANET sites had grown to over 100. In 1983, TCP/IP replaced NCP as the principal protocol for ARPANET, and the MILNET, the military portion of ARPANET, was split off as a separate network.

The Internet

The Internet is a network of networks. It was designed to allow the operation of large dissimilar networks with limited central management. It met scientific data exchange needs, as well as military requirements for robustness and the ability to automatically recover from any node or communication link failure. The Internet uses a packet switching communications paradigm, in which blocks of data are routed between nodes over data links. In each node, data packets are queued before being forwarded.

The Transmission Control Protocol (TCP) is a fault-tolerant procedure for verifying the correct delivery of data from client to server. To account for the possibility of data being lost in the delivery process, TCP supports the detection of missing or erroneous data, and triggers data retransmission until the information is completely and correctly transmitted.

The Internet Protocol (IP) is a method for transmitting packets of data from one node to another. IP forwards each packet based on a destination address. It operates on gateway machines that move data from one network to another. Internet routers carry out a function similar to that of the earlier ARPANET IMP’s. When a message arrives at an IP router, it uses an internal set of rules to decide where to send it next. Since there is no single predetermined physical path, if a communications link breaks down, data can still reach its destination through an alternate path. The IP protocol operates irrespective of the specific technology used for data transmission between nodes within each network.

In 1983, after TCP/IP replaced NCP as its principal protocol, ARPANET essentially became a subnetwork of the Internet. The Internet authorities assign ranges of addresses to different organizations. The organizations assign groups of their addresses to departments which may operate local area networks. Within local area networks, each machine has its own network address.

The Domain Name Service (DNS) provides a directory of computer host (domain) names and corresponding IP addresses. The names can consist of up to 255 characters. The addresses are used by networking equipment to route information. In the current IPv4 (IP version 4) protocol addresses consist of four sets of numbers separated by periods, each number in the range 0 to 255, e.g., 123.254.036.097. The IPv4 address scheme can represent about four billion addresses.

The global address of documents and other resources is given by a Uniform Resource Locator (URL). The URL consists of two parts, a protocol identifier, and a resource name. The two parts are separated by a colon and two slashes (://). For example, the URL ftp://ftp.abcde.org indicates that the ftp protocol (used for exchanging files) should be used for the resource ftp.abcde.org. The resource name specifies the numerical IP address or the human-readable domain name of the resource.

A technology called Network Address Translation (NAT) allows one outside IP address to be shared within a network between a number of computers and other devices. Each of the internal devices is given its own IP address within the local network, but to the wider Internet they all appear to come from one address or device.

The IPv6 protocol is intended to be an IPv4 follow-on, with the capability to accommodate a much greater total number of addresses. IPv6 addresses consist of eight sets of digits separated by colons, each set consisting of up to four hexadecimal digits, e.g., 2008:ED4:8A5:B131:19:43:C6:7. IPv6 allows for larger data packets, and will permit each machine and device to have its own IP address.

Internet Information Directories

Information access across the Internet was facilitated by the use of directories that listed information on connected sites. These directories were compiled by the individuals that operated the Internet. As the number of sites grew, it became impractical to maintain a centralized directory to assist users in locating information. The tools Archie, Veronica and Jughead were developed to automatically search directories of files located on public indices.

The Archie data base incorporates file directories from many systems. It responds to file name queries by providing directory paths to the system holding a copy of the desired file. Veronica and Jughead use Gopher, a menu system that simplifies locating and using Internet resources. Jughead facilitates searches of menu items at a Gopher site. Veronica has a data base of menus from many Gopher sites. Upon request, Veronica searches menu items and builds a customized Gopher menu.

SGML

Basic concepts of GML, document definition in terms of its contents and organization, and the separation of document definition from document formatting, were incorporated in the Standard Generalized Markup Language (SGML). SGML is an international standard for the description of marked-up electronic text. Published in 1986 as ISO Standard 8879, SGML is a metalanguage, that is, a means of formally describing a language, in this case, a markup language.

A markup language is a set of markup conventions used for encoding texts. SGML describes how to specify what markup is allowed, what markup is required, and how markup is to be distinguished from text. SGML emphasizes descriptive rather than procedural markup. It specifies that markup should describe a document’s structure and other attributes. Markup languages compliant with SGML standards facilitate the use of programs and data bases for processing documents.

HTML

In 1989, Timothy Berners-Lee envisioned a system for access to information by linking text in a computer to other information elsewhere. The system would be used to access various forms of documentation at CERN (European Organization for Nuclear Research), located near Geneva, Switzerland. The concept was in several ways similar to Licklider’s networks of “thinking centers,” but Berners-Lee had access to tools, practical knowledge, and technology developed during the previous quarter century.

Berners-Lee, working with Robert Cailliau, also at CERN, obtained funding for a project to develop a network-based hypertext system. In 1990 the project began development of prototype components: a hypertext page builder and editor, a web browser, a page transfer protocol, and a web server.

The term hypertext refers to text on a computer that, on demand, leads the user to related information through links and connections called hyperlinks. Hypertext can be made to serve various functions: when designated text or images are clicked on, or hovered over with a cursor, it may display a message, cause a transfer to related text (in a different web page or a different location in the current page), or cause another action, such as displaying an image or playing a recorded sound.

A new markup language was developed to create hypertext, the HyperText Markup Language (HTML). HTML differs from markup languages used in print publishing or electronic text preparation in that it supports the definition of hyperlinks (references that can be followed dynamically from a source document to a target document) and interactive user interfaces.

HTML was developed following SGML guidelines, except for some provisions for specifying hyperlinks and display formatting. HTML tags and their attributes are used to create HTML documents for display in browsers on the Internet. HTML supports the definition of document information content and organization, as well as control of how text and graphics are displayed.

There are close to 100 different HTML tags. Not all browsers support all HTML tags and their attributes, but all browsers support the most commonly used tags. The HTML language is used in constructing most web pages. HTML tags start with a tag opener, the character < (less than sign), and end with a tag closer, the character > (greater than sign). Almost all HTML tags require a paired closing tag. That is, marked up text appears as <tagname> text </tagname>. HTML documents are structured as a head and a body. The head contains information about the document. The body contains the information which is part of the document.

A simple HTML document would look like this:

The World Wide Web

The system of interlinked documents that Berners-Lee envisioned was given the name World Wide Web. The web’s interlinked documents are accessed via the Internet. World Wide Web documents are composed of web pages, which may contain text, images, videos, sound, and links to other web pages. A web browser is a software application which allows the user to view web pages and to navigate between them using hyperlinks. A web server is computer software that stores web pages and makes them available for network access.

Development of the World Wide Web continued throughout 1991, and an initial software release was made at CERN. Later that year, a demonstration and World Wide Web documentation were made available on telnet, and Berners-Lee released an initial definition of HTML. This information was placed in the public domain.

The Hypertext Transfer Protocol (HTTP) is a protocol for network data transfer. It was developed as a means to send and retrieve hypertext documents over the Internet. For web pages, the use of the HTTP protocol is specified in the URL of the desired World Wide Web resource, e.g., http://www.internetlooks.com.

Web Browsers

A web browser is a software application used to locate, display and interact with web pages. Web browsers are programs that use HTTP to make requests of Internet web servers on behalf of the user. In 1992, the early text-only web browser Lynx was developed by the Distributed Computing Group of the University of Kansas. A new browser, Mosaic, capable of linking to both text and graphics, was developed at the National Center for Supercomputer Applications (NCSA) at the University of Illinois. Mosaic was widely accepted, although Lynx remained in use for text-only applications.

The NCSA Mosaic browser was developed into Mosaic Netscape, first released in 1994. Mosaic Netscape was further developed by the Netscape Communications Corporation as Netscape Navigator, and became very popular.

In August 1995, Microsoft introduced the Internet Explorer graphical web browser. Like Netscape Navigator, Internet Explorer was derived from the earlier NCSA Mosaic browser. Internet Explorer became the most popular web browser after the release of Internet Explorer 5 in March 1999.

Widely used browsers include Microsoft Internet Explorer, Mozilla Firefox, Apple Safari, and the Lynx text-only browser. Graphical browsers such as Internet Explorer and Firefox can display graphics as well as text. Most modern browsers can present multimedia information, including sound and video, although they may require plug-in applications in some cases.

Special web browsers provide World Wide Web access for consumer electronics (CE) devices such as mobile phones, portable media players, and personal digital assistants (PDA’s). Special operating systems, such as the Microsoft Windows CE operating system, support operation of handheld computers and other CE devices.

Web Search Engines

The advent of the World Wide Web led to a rapid increase in the information available over the Internet. Because of its unstructured nature, information in the World Wide Web is not indexed like in a library. It became impractical to search the increasing amount of information using existing file search systems. Web search engines were developed to automate the search for specific information. In 1993, Matthew Gray, at MIT, developed a web crawler named the World Wide Web Wanderer. This crawler was used to build the Wanlex web search engine. The web search engine Aliweb was also developed in 1993.

The Jumpstation web search engine was released in early 1994. Jumpstation allowed searching through web page titles, and made use of a web crawler, or spider, to find web pages to search. WebCrawler, which also became available in 1994, was an early full text search engine that made it possible to search for specific words in a web page, a capability that became common in later search engines. Lycos, developed initially at Carnegie Mellon University, was another web search engine that first became available in 1994.

Web search engines not only automated the building of directories listing information on web sites, they provided the ability to perform efficient searches through the directories. A search engine paradigm that evolved had four basic components: a program (crawler or spider) that searched the web, a catalog of web pages or information gleaned from web pages found by the crawler, a user interface for user queries and display of search results, and a utility to search the catalogued information. Not all search engines use this paradigm. Some search engines remain directory-based, and some rely on human-built data entries instead of automated crawlers to collect information.

Many new web search engines appeared in the late 1990’s, among them Yahoo, Excite, Infoseek, Inktomi, AltaVista, Ask Jeeves, Dogpile, and Google. With its simple user interface and an efficient search paradigm, Google is currently the most popular web search engine.

Modern web search engines store information about many web pages, typically retrieved by web crawlers. Crawlers are automated web content finders and gatherers that follow the links they detect. The contents of the retrieved pages are processed to extract key words and relevant information. Data pulled from document titles, headings, or special hypertext fields called meta tags are stored in an index data base. Some search engines store all or part of the source (cache) page as well as information about the web pages; others store every word of every page they find. When a user enters a query into the search engine interface (using key words or phrases), the search utility examines its index and provides a prioritized list of links to matching web pages, usually displayed with the document’s title and text highlights.

The Worldwide Information System

The Internet first became operational with ARPANET’s transition to the TCP/IP protocol in 1983 and the enablement of inter-network connectivity. The timeline of the Internet begins with the earliest thoughts about sharing information through inter-computer communications and includes the period from the sending of the first ARPANET test message on October 29, 1969, to the present. Just as the development of the Internet had the effect of increasing the utilization and sharing of computer resources, the build up of the World Wide Web, starting in 1991, had the effect of increasing the use and spread of the Internet.

Computers, data storage devices, computer terminals, personal computers, and other equipment connected via the Internet constitute the physical locus of a worldwide information system of interlinked documents accessed via the World Wide Web. The scope of this worldwide information system comprises a major portion of human knowledge. Browsers and search engines allow users worldwide to find and display information, imagery, and sound, and electronic mail and messaging services and other applications support interactive communications.

The Structure of the Internet

The structure of the Internet as a whole follows from the grouping of nodes in networks that serve as hubs for communication with other networks.

A graphical representation of the Internet topology shows that at its core are the largest tightly connected networks. A much larger group of networks are highly connected to one another and to the core. The remaining peripheral networks communicate with the others by passing information through the core.

The topology of the Internet does not correspond to a geographic map. Internet nodes and users are not evenly distributed around the world, and IP addresses are generally not accurate indicators of location. Geographically, Internet use is most heavily concentrated in North America, Europe, and Eastern Asia.

The Internet Corporation for Assigned Names and Numbers (ICANN)

ICANN is the organization responsible for operating the Internet domain name and addressing system. It coordinates the unique identifiers that allow computers to know where to find other computers in the Internet. ICANN coordinates these unique identifiers across the world.

The ICANN President and staff support the ICANN Board of Directors and the Government Advisory Committee in making operational decisions. Registrars, working groups and advisory committees support administration, decision making, and technical solutions.

ICANN was formed in 1998 as a non-profit partnership of people and organizations from all over the world dedicated to keeping the Internet secure, stable and interoperable. The partnership promotes competition and develops policy on the Internet’s unique identifiers.

DNS Root Servers

The Internet Assigned Numbers Authority (IANA) is responsible for management of the DNS root zone. IANA assigns the operators of top-level domains, such as .com and .net, and performs technical and administrative tasks.

DNS root servers function as structural supports for the Internet. They provide authoritative directories that translate human-readable Internet names into network addresses. In March 2008, there were 13 root servers, with a total of 145 root server sites worldwide. Root servers have names of the form letter.root-servers.net, where letter is in the range A to M.

Domain names on the Internet can be regarded as ending in a period or dot. This final period is generally implied rather than explicit. When a computer on the Internet attempts to resolve a domain name, it works from right to left, asking each name server in turn about the element to its left. The root name servers (responsible for the . domain) know which servers are responsible for the top-level domains. In practice, most of the domain server information does not change very often and gets cached, so DNS queries to the root name servers are frequently not necessary.

Each top-level domain (such as .net) has its own set of servers, which in turn delegate to the name servers responsible for individual domain names. The servers responsible for individual domain names in turn answer queries for IP addresses of hosts and sub domains. The hosts and sub domains have the addresses of individual subscribers.

Voice over Internet Protocol

In 1995 the small company Vocaltec released software with the ability to support phone communications over the Internet. The software, Internet Phone, was designed to run on a Personal Computer and used sound cards, speakers, and microphones. The software used the H.323 protocol, designed to provide audio-visual communication sessions on any packet network.

Internet Phone used modems for Internet connection. This resulted in inferior voice quality when compared to a regular telephone connection. Nevertheless, by 1998 phone traffic over the Internet had grown to represent about one per cent of all voice traffic in the United States. By the year 2000, further technical developments and consumer interest led to phone traffic over the Internet amounting to more than three per cent of all voice communications.

The availability of high speed Internet data transmission, such as DSL and Cable, and the development of improved software applications supporting phone communications over the Internet, led to an increase in voice quality for Internet communications and to a growing share of overall voice traffic.

The Voice over Internet Protocol (VoIP) describes voice communications over the public Internet or any packet network employing the TCP/IP protocol suite. VoIP operates in datagram mode, employing the Internet Protocol (IP) for addressing and routing, the User Datagram Protocol (UDP) for host-to-host data transfer between application programs, and the Real Time Transport Protocol (RTP) for end-to-end delivery services.

VoIP typically employs sophisticated predictive compression algorithms, such as Low Delay Code Excited Linear Prediction (LD-CELP), to mitigate issues of latency and jitter over a packet-switched network.

VoIP services convert voice into a digital signal that travels over the Internet. When calling a regular phone number, the signal is converted to a regular telephone signal before it reaches the destination. VoIP can allow making a call directly from a computer, a special VoIP phone, or a regular phone connected to a special adapter. In addition, wireless “hot spots” in locations such as airports, parks, and cafes allow connecting to the Internet and may enable use of a VoIP service wirelessly.

A high speed (broadband) Internet connection is required for modern VoIP. The connection can be through a cable modem or high speed services such as a local area network or DSL. A computer, adaptor, or specialized phone is also required. Some VoIP services only work over a computer or a special VoIP phone. Other services allow the use of a regular telephone connected to a VoIP adapter.

If using a computer, appropriate software and a microphone are required. Special VoIP phones plug directly into a broadband connection and operate largely like a regular telephone. If using a telephone with a VoIP adapter dialing is as usual, and the service provider may also provide a dial tone. Many, but not all VoIP services connect directly to emergency services through 9-1-1.

Mobile Internet

The Wireless Access Protocol (WAP) is a communications protocol for mobile-specific web pages. Originally released in 1998, it supports mobile internet access. WAP sites are designed to make it easier to display and navigate web pages on mobile devices, such as PDA’s and cell phones. Mobile devices have limited communications, display, and processing capabilities when compared to desktop and laptop personal computers.

WAP sites have text and graphic content designed specifically for small screens that seldom have resolutions greater than about 500 x 500 pixels. WAP browsers provide the basic services of a computer based web browser, but simplified to operate within the restrictions of a mobile device.

WAP uses the Wireless Transaction Protocol (WTP) transmission layer protocol, and the Wireless Session Protocol (WSP) method for establishing and releasing web sessions. Web implementations for mobile devices using WAP employ the Wireless Markup Language (WML) to provide efficient utilization of resources. Increased processing capacity also makes possible some use of XHTML or HTML markup languages. WML web pages have URL’s of the form http://mysite.com/mypage.wml.

The Open Mobile Alliance (OMA) is an organization of wireless equipment and mobile systems manufacturers, software providers, and mobile operators. It was founded in 2002 to provide a forum for mobile industry stakeholders. OMA provides standards for interoperable mobile applications, such as web browsing and messaging.

Home