Architecture of the InternetEdit
The internet is a worldwide network of computers with the property that each computer can send data to and receive data from any other computer on the internet.
A brief history of the internetEdit
The idea of connecting computers via phone lines or some other long distance network was first tested in 1965 when two university researchers, Larry Roberts and Thomas Merrill, connected a computer in Massachusetts with one in California using a phone line and demonstrated that they coould run programs and receive data on the remote machines. A key idea behind this research was that the computers would communicate by breaking up their data into many small packets and sending these packets individually. If any packets were lost (due to background noise on the line), they could easily be resent. This experiment led directly to a DARPA (Defense Advanced Research Projects Agency) proposal in 1967 to build the ARPANET, which is a military precursor of the internet. In 1968 a group led by Frank Heart at BBN in Boston won the government contract to build the initial ARPANET hardware. In 1969, the initial ARPANET was constructed and consisted of four computers: three in California and one in Utah. In 1972, Roberts wrote the first email program, and email quickly became the most frequently used network application. In 1973, Vint Cerf and Robert Kahn proposed a new set of communication rules for the computer networks called TCP/IP (Tranmission Control Protocol/ Internet Protocol) which allowed users to implement a wide range of network applications including network telephony, email, and network disk sharing. The ARPANET was converted to a TCP/IP net in 1983 at which point it was split into two nets: the MILNET for military applications and the ARPANET for civilian applications. Throughout the 70's several other networks were developed. These included CSNET (connecting Computer Science Departments), USENET (connecting UNIX computers), and BITNET (connecting academic mainframe computers). The 80s saw the rapid prolifieration of PC's and workstations combined into small local area networks (LANs) and these LANs came to be added to the ARPANET in greater numbers, resulting in a rapid growth of the internet. Also, in 1985, the NSFNET was formed by the National Science Foundation with the stipulation that a university could connect to this network only if it provided access to all scholars at the institution, not just the science departments. Another important development during the 1980's was the connection of networks into a single internet all using the TCP/IP protocol for communication. The 90's saw the birth of the World Wide Web and the rapid expansion of the internet both in terms of size and in terms of its use by the general population. References: A Brief History of the Internet by Barry M. Leiner, Vinton G. Cerf, David D. Clark, Robert E. Kahn, Leonard Kleinrock, Daniel C. Lynch, Jon Postel, Larry G. Roberts, Stephen Wolff
Internet Addressing: domain names and IP addressesEdit
The internet currently consists of about 40 million servers although this number grows every month (and actually oscillates minute by minute for reasons that will become clear). Each computer on the internet has a unique identification number called its IP address (for Internet Protocol). An IP number consists of a sequence of four numbers in the range 0-255. For example, a typical IP address at Brandeis in 2003 is 184.108.40.206, where the numbers in the IP address are separated by periods by convention. This is the dotted decimal form of an IP address. IP addresses are actually stored on the computer and transmitted as 32 bit long binary numbers. Please read the appendix on binary numbers to learn about binary numbers and how they are used to represent decimal numbers. Most computers on the internet also have an identifying name known as a domain name. For example, the domain name for the main Brandeis web server is www.brandeis.edu and its IP address is 220.127.116.11. The relationship between domain names and IP addresses is available on the net from computers known as domain name servers. The internet actually consists of a large number of networks which are seamlessly interconnected. For example, the Local Area Network (LAN) at Brandeis University consists of a few thousand computers. These computers are all directly connected to the internet and have IP addresses of the form 129.64.xxx.yyy where xxx and yyy are numbers in the range 0-255. Conversely, any IP address of this form refers to the Brandeis LAN. Thus, the Brandeis LAN can expand to include up to 256 x 256 = 65536 computers which can all be simultaneously directly connected to the internet. This method of allocating IP addresses in blocks is widely used today.
(Add to this section?) -IPV4 vs. IPV6 -MAC addresses
The internet is a large collection of interconnected networks. There are approximately 40 million servers currently active on the internet. When a message is sent from one computer to another. That message is partitioned into smaller packets (so as not to monopolize the network), and each packet is sent individually to the destination. Each individual packet is sent from computer to computer as it makes its way through the internet. It typically passes from the user's computer to a centralized router at the University or ISP office. From there it is sent across a series of networks before arriving at the destination computers ISP, where it is finally forwarded to the destination computer. On UNIX systems you can use the traceroute program to print the list of intermediate computers (and the length of time the message takes to get there). For example, we show below a traceroute of a message from a computer in Brandeis University (in Waltham Mass, USA) to the main web server at the University of Tokyo in Japan.
% traceroute www.u-tokyo.ac.jp traceroute to www.u-tokyo.ac.jp (18.104.22.168), 30 hops max, 40 byte packets 1 igs.cs-i.brandeis.edu (22.214.171.124) 75.332 ms 2.887 ms 3.342 ms 2 126.96.36.199 (188.8.131.52) 2.558 ms 2.94 ms 3.561 ms 3 184.108.40.206 (220.127.116.11) 4.227 ms 4.087 ms 4.583 ms 4 18.104.22.168 (22.214.171.124) 16.261 ms 9.059 ms 11.282 ms 5 chinng-nycmng.abilene.ucaid.edu (126.96.36.199) 42.434 ms 29.005 ms 33.558 ms 6 iplsng-chinng.abilene.ucaid.edu (188.8.131.52) 48.242 ms 39.73 ms 42.387 ms 7 kscyng-iplsng.abilene.ucaid.edu (184.108.40.206) 42.807 ms 43.039 ms 42.4 ms 8 dnvrng-kscyng.abilene.ucaid.edu (220.127.116.11) 64.458 ms 53.008 ms 103.271 ms 9 sttlng-dnvrng.abilene.ucaid.edu (18.104.22.168) 78.625 ms 78.774 ms 78.308 ms 10 transpac-pwave.pnw-gigapop.net (22.214.171.124) 78.716 ms 79.304 ms 78.448 ms 11 126.96.36.199 (188.8.131.52) 207.539 ms 207.635 ms 207.323 ms 12 wide-ge-tpr3.jp.apan.net (184.108.40.206) 206.627 ms 207.231 ms 207.323 ms 13 foundry3.nezu.wide.ad.jp (220.127.116.11) 208.736 ms 207.737 ms 207.612 ms 14 ra37-vlan560.nc.u-tokyo.ac.jp (18.104.22.168) 216.799 ms 216.645 ms 229.441 ms 15 ra36-vlan3.nc.u-tokyo.ac.jp (22.214.171.124) 216.845 ms 216.945 ms 238.735 ms 17 foundry1.nc.u-tokyo.ac.jp (126.96.36.199) 226.653 ms 216.101 ms 216.02 ms ms 18 188.8.131.52 (184.108.40.206) 215.942 ms 216.082 ms 216.13 ms
Notice that the message spends sometime getting out of Brandeis, and then goes to New York City, Chicago, Kansas City, Denver, and between hops 10 and 11, crosses the Pacific Ocean. It takes a few more hops to get to the University of Tokyo, and then moves through some local routers before it gets to the main web server. This is a typical path for an internet packet.
References: Traceroute from all over the world
Ports, Sockets, and ServicesEdit
The computers on the internet interact in a wide variety of ways, but their interaction is nonetheless restricted. It would not be wise to allow any computer on the internet to have full access to every other computer on the net because an unscrupulous user might decide to delete all of your disk files or to otherwise use your computer without permission. To get around this problem, the internet is modelled on an abstract view of the net in which each computer specifies exactly what kinds of interactions it will allow. These types of interactions are called services and each computer on the net can offer up to 65536 services. These services are specified by a number from 0 to 65535 called a port. Typically, the ports with numbers under 1024 are reserved for system services (such as email and web page serving), but anyone is free to offer any service they please on ports numbered greater than 1024. A computer that offers a service to another computer is called a server and a computer that requests a service is called a client. It is typical for computers on the internet to be both clients and servers and the same time. The communication between client and server is initiated by the client by specifying the IP address of the server computer and the port number of the service to be provided. If the specified computer is offering that service, then a special connection called a socket is created. The socket allows the two computers to send data back and forth between themselves. References: Wikipedia and in particular the Router entry NOTE: Wikipedia is completely untrustworthy because anyone can vandalize any page at any time. Nevertheless, many webpages are not currently vandalized and hence contain useful information. The only way you can have any trust that something you read on wikipedia is true is to fact-check everything you read there from an authoratative source. Wikipedia articles are encouraged to provide evidence for all claims in the form of links to authoratative sources.
Common Services on the netEdit
Some of the more common system services are listed below. Each service has a set of rules governing how the client and server interact. These rules are called protocols and they simply represent the conventions that the two computers will use when communicating on that port.
* Echo (port 7) an echo service, simply echos back what it receives * Daytime (port 13) this returns the date and local time and ignores client input * FTP (ports 20,21) allows the client to transfer files of data to and from the server. * Telnet (port 23) allows the client to interact with the servers operating system remotely * SMTP (port 25) offers an email service for delivering email to a user on the server * DNS (port 53) domain name serving, returns IP addresses for domain names * WWW (port 80) uses the HTTP protocol and sends specified web pages to the client. * POP3 (port 110) offers another email service
Some Common Internet Services You can access some of these ports from Linux using the telnet command. Below we give examples of accessing the date and echo services respectively: The date service returns the local time on the server being queried. The echo service is used for testing whether a connection is active and just echo back each line of text that it receives. USER % telnet www.cs.brandeis.edu 13
Trying 220.127.116.11... Connected to diamond.cs.brandeis.edu. Escape character is '^]'. Thu Aug 31 15:55:41 2000 Connection closed by foreign host.
Accessing the DATE service USER % telnet www.cs.brandeis.edu 7
Trying 18.104.22.168... Connected to diamond.cs.brandeis.edu. Escape character is '^]'.
USER This is the echo port
This is the echo port
USER bye bye
Accessing the echo service on port 7 References: IANA Port Assignments
Web Browsers and ServersEdit
The HTTP service is perhaps the most revolutionary service that has been developed for use on the internet. It provides a mechanism for clients to access files on the server by giving the name of the file in the webserver folder. The HTTP server then responds to such a request by returning several lines of information about the file (e.g. what kind of data it contains, text, image, movie, sound, etc.) when it was last modified, how large the file is, etc. HTTP services are generally provided on port 80. The HTTP service is one half of the technological foundation of the World Wide Web. The other half is the HTML language. HTML is an acronym for Hypertext Markup Language. HTML specifies the layout of webpages and provides mechanisms for including links to other webpages and to images, sounds, movies, and other content. In the next Chapter we will provide an introduction to HTML and some related technology (CSS and XML). Below we give an example of the use of this service to request the web page "/~cs2a/index.html" from the server "www.cs.brandeis.edu". Observe that the request (in blue) specifies the page to access and the response provides quite a bit of information about the file including (in yellow) its size, its last modification date, its size, what type of information is in the file, the kind of server that is providing the service, the locate time at which the page is being served, and some more arcane information as well, as well as the actual web page content itself (in green).
% telnet www.cs.brandeis.edu 80 Trying 22.214.171.124... Connected to diamond.cs.brandeis.edu. Escape character is '^]'. GET /~cs2a/ HTTP/1.0 HTTP/1.1 200 OK Date: Tue, 02 Sep 2003 21:20:33 GMT Server: Apache/1.3.26 (Unix) Last-Modified: Tue, 02 Sep 2003 21:20:16 GMT ETag: "44b052-221-3f550990" Accept-Ranges: bytes Content-Length: 545 Connection: close Content-Type: text/html <HTML> <TITLE>Brandeis University, Intro to Computers, Cosi 2a, Aut 03</TITLE> <BODY style="background:blue"> <META HTTP-EQUIV="Refresh" CONTENT="2; URL=http://frege.cs-i.brandeis.edu:9090/cs2a03/index.html"> <center> <h1 style="background:white; border:thick double blue; font:bold 40pt Helvetica"> The Home page for CS2a<br> has moved to<br> <A HREF="http://frege.cs-i.brandeis.edu:9090/cs2a03/index.html"> http://frege.cs-i.brandeis.edu:9090/cs2a03/index.html </A> </h1> Sorry for the inconvenience. <p> Tim Hickey </center> </BODY> </HTML> Connection closed by foreign host. Accessing the HTTP service on port 80
There are many web browsers that are currently available. The most common browsers at the moment are Internet Explorer and Netscape, but some of the lesser known browsers such as Opera and Amaya, provide additional features which are not currently supported by the mainstream browsers such as mathematical and graphical markup processing. URLs and Domain Names As you probably know, all of the content on the World Wide Web can be accessed by providing its address to a browser. The formal name for a web address is URL which stands for Universal Resource Locator. ( Some people also use URI for Universal Resource Identifier.) URLs are our first example of a formal language. Each URL has several parts, some of which are optional. Some examples of URLs are: http://www.brandeis.edu http://www.brandeis.edu:80 http://www.brandeis.edu:80/index.html http://www.brandeis.edu:80/go/index.php?go=cosi http://126.96.36.199/~tim http://jscheme.cs.brandeis.edu:8080 ftp://ftp.cc.gatech.edu/pub/linux/ The simplest form of a URL is just: http://DOMAINNAME where "http" and "ftp" specify the type of service being accessed and "DOMAINNAME" specifies the computer that is offering the service. The general form for a URL is PROTOCOL://DOMAINNAME:PORT/PATH/FILE.EXT#P?N=V&N2=V2 ... Lets break this apart. "PROTOCOL" specifies the protocol that the web browser must use to communicate with the web server. There are many other protocols besides http. The most common is "ftp" which is the "file transfer protocol." The "mailto:" protocol is also common and is used to allow the user to send email from a browser. DOMAINNAME is the symbolic name of the web server. All web servers have a unique IP address (as described above). The conversion between domain names and IP addresses is performed using special servers on the net called "Domain Name Servers." These servers accept domain names and send back the corresponding IP address. They are the equivalent of the "411" service on phone networks and every browser must have the address of at least one Domain Name Server if it is going to use domain names. You can use IP addresses directly in the URL instead of a domain name, but this is rarely done as IP addresses can be hard to remember. "PORT" is a number between 0 and 65535 which specifes the port used by the server, the default value is 80 for http protocols "PATH" specifies the location of the file on the server "FILE.EXT" is the name of the file to be returned "#POS" specifies a location in the file. POSITION is a symbolic name (containing no blank spaces). The browser will scroll the window down to that position when first viewing the file. item "?N=V&..." This is a mechanism for passing values to the server which it can use to generate the web page it will send back to you. You often seen this kind of an address after filling out a search form for a search engine. When you bookmark this page and return to it, the URL contains a precise description of everything you entered on the form. These more complex URLs will be come easier to understand after we have discussed servlets in a later chapter. References: none ...