Understanding HTTP

What is HTTP?

HTTP is a client-server protocol by which two machines can communicate over a tcp/ip connection. An HTTP server is a program that sits listening on a machine's port for HTTP requests. An HTTP client (we will be using the terms HTTP client and web client interchangeably) opens a tcp/ip connection to the server via a socket, transmits a request for a document, then waits for a reply from the server. Once the request-reply sequence is completed, the socket is closed. So the HTTP protocol is a transactional one. The lifetime of a connection corresponds to a single request-reply sequence. (a transaction)

HTTP is the protocol used for document exchange in the World-Wide-Web. Everything that happens on the web, happens over HTTP transactions. TCP/IP networking and HTTP are the two essential components that make the web work. In order to write software that accesses the web (like a web browser, or a custom web client) you need a basic understanding of both. In this article we will cover HTTP, how it works and how to use it for simple transactions. We plan to include in this site some more articles which will cover basic network programming issues relating to TCP/IP and HTTP.

The client side: HTTP requests

So basically what happens when we open a URL with the browser, is that the browser figures out from the url, what the HTTP server's host machine and port are, as well as the document path for the document we request from the server. For example, http://www.perlfect.com/articles/index.shtml suggests the document /articles/index.shtml on the server at www.perlfect.com and port 80. (no port is specified in the url, so the default, 80, is used) Subsequently, an HTTP request will be recited for that document and the appropriate connection via TCP/IP will be made with the server. Then the client (the browser, that is) will send the request, and wait for the server to respond with an HTTP response and, hopefully, the requested document. If all goes fine, the browser will arrange for displaying the document on our desktop window. (by rendering the HTML code into visual layout and making additional request for any images or other files that are embedded in the HTML document)

Now, let's have a look under the hood to see what those HTTP requests lok like. Suppose you type the URL of the previous example, http://www.perlfect.com/articles/index.shtml on your netscape's location text box. Here's what the request will look like. (for the sake of clarity, the following request contains just as many headers as needed to demonstrate the HTTP request's general form and functionality - Netscape will surely make up a more complicated request, but the essential part of it are what is shown below)

  GET /articles/index.shtml HTTP/1.0
  User-Agent: Mozilla 4.0 (X; I; Linux-2.0.35i586)
  Host: www.perlfect.com
  Accept: image/gif, image/jpeg, */*
    

The first line contains three important pieces of information: The request method (GET), the requested document (/articles/index.shtml) and the HTTP protocol version that the client uses. (1.0) You might wonder what the request method is, but you really don't need to be worried about it at this point. There are a few different request methods the omst common ones being:

There are others, too that are much less frequently used, and we won't discuss them. The general structure of a request applies to all methods, so we will stick to GET for now, to demonstrate how request work in general.

Following that, there are a number of lines called request headers. They are all of the form: Header-name: Header Value and they specify information and parameters that will help the server provide a suitable response. In this example the parameters indicate the client software name and version, the server hostname for which the request is meant (this is because sometimes, a single HTTP server might serve documents under different names, and each name corresponds to a different directory tree - so the server needs to be told what name to look up the document for) and the MIME types that the client is willing to accept.

The server side: HTTP responses

Now, looking on to the server's response:

  HTTP/1.0 200 OK
  Date: Thus, 08 Oct 1998 16:17:52 GMT
  Server: Apache/1.1.1
  Content-type: text/html
  Content-length: 1538
  Last-modified: Mon, 05 Oct 1998 01:23:50 GMT

  <HTML>
  <HEAD>
  <TITLE>Perlfect Solutions</TITLE>
  ...
    

The first line contains the version of HTTP used in the response, and the response status in both numerical code (200) and human-readable string (OK). There are a number of such resonse codes. To give two common examples : 200 OK means that the document has been found and that it follows the response headers and 404 NOT FOUND means that the document path does not exist.

Similarly to request headers, we also have response headers, which are used to pass information about the document in transit and the status of the server and the request. In the example above the headers provide information about the server software and version, the date and time the response was issued and finally the MIME type, length and last modification date of the document in transit.

A blank line marks the end of the head of the resonse, and then the document follows. After the browser's finished receiving the HTML document in question, and the TCP/IP connection has been dropped, it will go on to request any additional embedded documents (in-line images for example) and render the page's layout on screen. Clicking on a link will cause the browser to issue a new request for the page pointed to by the link, and so on.

Playing around

As mentioned earlier in our discussion, the examples shown here, while perfectly correct and working, are merely indicative of the HTTP protocol. The reader is encourged to play around and experiment with the HTTP requests and responses by real clients and servers. For example, if you do a simple telnet to the port 80 of a host with an active web server and type in a simple request like the example we gave, you can have fun watching the server's response come streaming live. Try non-existent documents, images, or whatever to see real examples of responses. On the other end check if your web server provides diagnostic facilities to let you inspect the incoming requests from web browsers. As with anything in computing, there's a lot to learn from such playing around.

HTTP/1.0 Details

Servers just return documents via HTTP, while clients must present documents to users. Thus HTTP servers are simpler than clients (6,500 vs. 80,000 lines of C code for NCSA httpd 1.3 and Mosaic 2.5).

HTTP/1.0 is a very simple-minded protocol. It uses ASCII commands terminated by CR/LF. So you can simulate an HTTP client with a telnet session:

daphne.cs.vt.edu> telnet www 80
Trying 128.173.40.201... Connected to info.cs.vt.edu.
Escape character is '^]'.
GET / HTTP/1.0

<HTML>
<HEAD>
<TITLE>
Virginia Tech Computer Science Department Home Page
</TITLE>
</HEAD>

<BODY>

<CENTER>
<A HREF=/htbin/imagemap/image/maps/vtcs.gif.map>
<IMG BORDER=0 SRC=/image/new_headers/vtcs.gif 
 ALT="Virginia Tech Department of Computer Science" ISMAP></A>
<P>
660 McBryde Hall<BR>
Blacksburg, VA 24061<BR>
Phone: (540)231-6931<BR>
FAX: (540)231-6075<P>
Department Head: Dr. John M. Carroll
</CENTER>

<HR SIZE=4>

<P> Welcome to the Virginia Tech Department of Computer Science Home
Page. 

...

<H6>
Last updated August 28, 1996<BR>
http://www.cs.vt.edu/
</H6>
</BODY>
</HTML>
Connection closed by foreign host.

In the example above, we fetched the home page of host www.cs.vt.edu.


HTTP Message Types

HTTP/1.0 request format:

request-line:  reqest request-URI HTTP-version
headers (0 or more lines)
<blank line>
body (only if a POST command)

Possible requests:

GET
Request a document named by a URI (uniform resource identifier - superclass of URL)
HEAD
Return only header of specified URI (e.g., test link for validity, recent modification)
POST
Post a document: elelctronic mail, form input, news.

HTTP/1.0 response format:

status-lineHTTP-version response-code human-readable-phrase
headers (0 or more)
<blank line>
body

Example: Header in Response

daphne.cs.vt.edu> telnet ei 80
Trying 128.173.40.129... Connected to ei.cs.vt.edu.
Escape character is '^]'.
GET /~wwwbtb/fall.96/ClassNotes/Protocols/MIMEscreen.gif HTTP/1.0

HTTP/1.0 200 Document follows
Date: Tue, 10 Sep 1996 14:34:06 GMT
Server: NCSA/1.4.2
Content-type: image/gif
Last-modified: Tue, 10 Sep 1996 13:25:26 GMT
Content-length: 9755

GIF87<9750 non-ascii characters>Connection closed by foreign host.

Example: If-Modified-Since

A client that has a copy of a document in its cache may send a GET with the If-Modified-Since header to check whether the cached copy is up-to-date.

daphne.cs.vt.edu> telnet ei 80
Trying 128.173.40.129... Connected to ei.cs.vt.edu.
Escape character is '^]'.
GET /~wwwbtb/fall.96/ClassNotes/Protocols/MIMEscreen.gif HTTP/1.0
If-Modified-Since: Saturday, 10-Sep-96 20:20:14

HTTP/1.0 304 Not modified
Date: Tue, 10 Sep 1996 14:40:58 GMT
Server: NCSA/1.4.2

Connection closed by foreign host.

Example: Server Redirect

daphne.cs.vt.edu> telnet ei 80
Trying 128.173.40.129... Connected to ei.cs.vt.edu.
Escape character is '^]'.
GET /~wwwbtb HTTP/1.0

HTTP/1.0 302 Found
Date: Tue, 10 Sep 1996 14:42:49 GMT
Server: NCSA/1.4.2
Location: http://ei.cs.vt.edu/~wwwbtb/
Content-type: text/html

<HEAD><TITLE>Document moved</TITLE></HEAD>
<BODY><H1>Document moved</H1>
This document has moved <A
HREF="http://ei.cs.vt.edu/~wwwbtb/">here</A>.<P>
</BODY>
Connection closed by foreign host.

HTTP Header Fields

From Figure 13.3 in Stevens:
Header Name Meaning Request? Response? Both?
Allow yes
Authorization Send userid/password yes
Content-Encoding yes
Content-Length How many bytes of data? yes
Content-Type MIME-type of data yes
Date Current time/date yes yes
Expires When to discard from cache yes
From yes
If-Modified-Since For conditional GET yes
Last-Modified Date data last modified yes
Location yes
MIME-Version So MIME names can change yes yes
Pragma No-cache, etc. yes yes
Referer URL previously visited yes
Server yes
User-Agent Web browser name yes
WWW Authenticate yes

Return Codes

Classification of return codes
Response Code Meaning
1xx Not used
2xxx Request succeeded
3xxx Client error
4xxx Server error
From Figure 13.4 in Stevens:
Response Meaning
200 OK, request succeeded
202 Request accepted, but processing incomplete
301 Requested URL has been assigned a new, permanent URL
302 Reqeusted URL has been temporarily assigned a new URL 
304 Document not modified (response to conditional GET
400 Bad request
401 Request not accepted because user authentication required
403 Forbidden for unspecified reason
404 Not found
500 Internal server error
501 Not implemented
502 Invalid response from gateway or upstream server
503 Service temporarily unavailable