Exploring Java

Chapter 9 - Network Programming

Contents:
Sockets
Datagram Sockets
Working with URLs
Web Browsers and Handlers
Writing a Content Handler
Writing a Protocol Handler

The network is the soul of Java. Most of what is new and exciting about Java centers around the potential for new kinds of dynamic, networked applications. This chapter discusses the java.net package, which contains classes for communications and working with networked resources. These classes fall into two categories: the sockets API and classes for working with Uniform Resource Locators (URLs). Figure 9.1 shows all of the classes in java.net.

Figure 9.1: The java.net package

[Graphic: Figure 9-1]

Java's sockets interface provides access to the standard network protocols used for communications between hosts on the Internet. Sockets are the mechanism underlying all other kinds of portable networked communications. Your processes can use sockets to communicate with a server or peer applications on the Net, but you have to implement your own application-level protocols for handling and interpreting the data. Higher-level functionality, like remote procedure calls and distributed objects, are implemented with sockets.

The Java URL classes provide an API for accessing well-defined networked resources, like documents and applications on servers. The classes use an extensible set of prefabricated protocol and content handlers to perform the necessary communication and data conversion for accessing URL resources. With URLs, an application can fetch a complete file or database record from a server on the network with just a few lines of code. Applications like Web browsers, which deal with networked content, use the URL class to simplify the task of network programming. They also take advantage of the dynamic nature of Java, which allows handlers for new types of URLs to be added on the fly. As new types of servers and new formats for content evolve, additional URL handlers can be supplied to retrieve and interpret the data without modifying the original application.

In this chapter, I'll try to provide some practical and realistic examples of Java network programming using both APIs. Sadly, the current state of affairs is disappointing. The real release of HotJava isn't available, and Netscape Navigator imposes many restrictions on what you can do. In addition, a few standards that we need haven't been defined. Nevertheless, you can use all of Java's networking capabilities to build your own free-standing applications. I'll point out the shortcomings with Netscape Navigator and the standards scene as I go along.

9.1 Sockets

Sockets are a low-level programming interface for networked communications. They send streams of data between applications that may or may not be on the same host. Sockets originated in BSD UNIX and are, in other languages, hairy and complicated things with lots of small parts that can break off and choke little children. The reason for this is that most socket APIs can be used with almost any kind of underlying network protocol. Since the protocols that transport data across the network can have radically different features, the socket interface can be quite complex. (For a discussion of sockets in general, see UNIX Network Programming, by Richard Stevens [Prentice-Hall].)

Java supports a simplified object-oriented interface to sockets that makes network communications considerably easier. If you have done network programming using sockets in C or another structured language, you should be pleasantly surprised at how simple things can be when objects encapsulate the gory details. If this is the first time you've come across sockets, you'll find that talking to another application can be as simple as reading a file or getting user input. Most forms of I/O in Java, including network I/O, use the stream classes described in Chapter 8, Input/Output Facilities. Streams provide a unified I/O interface; reading or writing across the Internet is similar to reading or writing a file on the local system.

Java provides different kinds of sockets to support two distinct classes of underlying protocols. In this first section, we'll look at Java's Socket class, which uses a connection-oriented protocol. A connection-oriented protocol gives you the equivalent of a telephone conversation; after establishing a connection, two applications can send data back and forth; the connection stays in place even when no one is talking. The protocol ensures that no data is lost and that it always arrives in order. In the next section we'll look at the DatagramSocket class, which uses a connectionless protocol. A connectionless protocol is more like the postal service. Applications can send short messages to each other, but no attempt is made to keep the connection open between messages, to keep the messages in order, or even to guarantee that they arrive.

In theory, just about any protocol family can be used underneath the socket layer: Novell's IPX, Apple's AppleTalk, even the old ChaosNet protocols. But this isn't a theoretical world. In practice, there's only one protocol family people care about on the Internet, and only one protocol family Java supports: the Internet protocols, IP. The Socket class speaks TCP, and the DatagramSocket class speaks UDP, both standard Internet protocols. These protocols are available on any system that is connected to the Internet.

Clients and Servers

When writing network applications, it's common to talk about clients and servers. The distinction is increasingly vague, but the side that initiates the conversation is usually the client. The side that accepts the request to talk is usually the server. In the case where there are two peer applications using sockets to talk, the distinction is less important, but for simplicity we'll use the above definition.

For our purposes, the most important difference between a client and a server is that a client can create a socket to initiate a conversation with a server application at any time, while a server must prepare to listen for incoming conversations in advance. The java.net.Socket class represents a single side of a socket connection on either the client or server. In addition, the server uses the java.net.ServerSocket class to wait for connections from clients. An application acting as a server creates a ServerSocket object and waits, blocked in a call to its accept() method, until a connection arrives. When it does, the accept() method creates a Socket object the server uses to communicate with the client. A server carries on multiple conversations at once; there is only a single ServerSocket, but one active Socket object for each client, as shown in Figure 9.2.

Figure 9.2: Clients and servers, Sockets and ServerSockets

[Graphic: Figure 9-2]

A client needs two pieces of information to locate and connect to another server on the Internet: a hostname (used to find the host's network address) and a port number. The port number is an identifier that differentiates between multiple clients or servers on the same host. A server application listens on a prearranged port while waiting for connections. Clients select the port number assigned to the service they want to access. If you think of the host computers as hotels and the applications as guests, then the ports are like the guests' room numbers. For one guest to call another, he or she must know the other party's hotel name and room number.

Clients

A client application opens a connection to a server by constructing a Socket that specifies the hostname and port number of the desired server:

try { 
    Socket sock = new Socket("wupost.wustl.edu", 25); 
}  
catch ( UnknownHostException e ) { 
    System.out.println("Can't find host."); 
}  
catch ( IOException e ) { 
    System.out.println("Error connecting to host."); 
} 

This code fragment attempts to connect a Socket to port 25 (the SMTP mail service) of the host wupost.wustl.edu. The client handles the possibility that the hostname can't be resolved (UnknownHostException) and that it might not be able to connect to it (IOException).

As an alternative to using a hostname, you can provide a string version of the host's IP address:

Socket sock = new Socket("128.252.120.1", 25);    // wupost.wustl.edu 

Once a connection is made, input and output streams can be retrieved with the Socket getInputStream() and getOutputStream() methods. The following (rather arbitrary and strange) conversation illustrates sending and receiving some data with the streams. Refer to Chapter 8, Input/Output Facilities for a complete discussion of working with streams.

try { 
    Socket server = new Socket("foo.bar.com", 1234); 
    InputStream in = server.getInputStream(); 
    OutputStream out = server.getOutputStream(); 
 
    // Write a byte 
    out.write(42); 
 
    // Say "Hello" (send newline delimited string) 
    PrintStream pout = new PrintStream( out ); 
    pout.println("Hello!"); 
 
    // Read a byte 
    Byte back = in.read(); 
 
    // Read a newline delimited string 
    DataInputStream din = new DataInputStream( in ); 
    String response = din.readLine(); 
 
    server.close(); 
}  
catch (IOException e ) { } 

In the exchange above, the client first creates a Socket for communicating with the server. The Socket constructor specifies the server's hostname (foo.bar.com) and a prearranged port number (1234). Once the connection is established, the client writes a single byte to the server using the OutputStream's write() method. It then wraps a PrintStream around the OutputStream in order to send text more easily. Next, it performs the complementary operations, reading a byte from the server using InputStream's read() and then creating a DataInputStream from which to get a string of text. Finally, it terminates the connection with the close() method. All these operations have the potential to generate IOExceptions; the catch clause is where our application would deal with these.

Servers

After a connection is established, a server application uses the same kind of Socket object for its side of the communications. However, to accept a connection from a client, it must first create a ServerSocket, bound to the correct port. Let's recreate the previous conversation from the server's point of view:

// Meanwhile, on foo.bar.com... 
try { 
    ServerSocket listener = new ServerSocket( 1234 ); 
 
    while ( !finished ) { 
        Socket aClient = listener.accept();    // wait for connection 
 
        InputStream in = aClient.getInputStream(); 
        OutputStream out = aClient.getOutputStream(); 
 
        // Read a byte 
        Byte importantByte = in.read(); 
 
        // Read a string 
        DataInputStream din = new DataInputStream( in ); 
        String request = din.readLine(); 
 
        // Write a byte 
        out.write(43); 
 
        // Say "Goodbye" 
        PrintStream pout = new PrintStream( out ); 
        pout.println("Goodbye!"); 
 
        aClient.close(); 
    } 
 
    listener.close(); 
 
} 
catch (IOException e ) { } 

First, our server creates a ServerSocket attached to port 1234. On some systems there are rules about what ports an application can use. Port numbers below 1024 are usually reserved for system processes and standard, well-known services, so we pick a port number outside of this range. The ServerSocket need be created only once. Thereafter we can accept as many connections as arrive.

Next we enter a loop, waiting for the accept() method of the ServerSocket to return an active Socket connection from a client. When a connection has been established, we perform the server side of our dialog, then close the connection and return to the top of the loop to wait for another connection. Finally, when the server application wants to stop listening for connections altogether, it calls the close() method of the ServerSocket.[1]

[1] A somewhat obscure security feature in TCP/IP specifies that if a server socket actively closes a connection while a client is connected, it may not be able to bind (attach itself) to the same port on the server host again for a period of time (the maximum time to live of a packet on the network). It's possible to turn off this feature, and it's likely that your Java implementation will have done so.

As you can see, this server is single-threaded; it handles one connection at a time; it doesn't call accept() to listen for a new connection until it's finished with the current connection. A more realistic server would have a loop that accepts connections concurrently and passes them off to their own threads for processing. (Our tiny HTTP daemon in a later section will do just this.)

Sockets and security

The examples above presuppose the client has permission to connect to the server, and that the server is allowed to listen on the specified socket. This is not always the case. Specifically, applets and other applications run under the auspices of a SecurityManager that can impose arbitrary restrictions on what hosts they may or may not talk to, and whether they can listen for connections. The security policy imposed by the current version of Netscape Navigator allows applets to open socket connections only to the host that served them. That is, they can talk back only to the server from which their class files were retrieved. Applets are not allowed to open server sockets themselves.

Now, this doesn't meant an applet can't cooperate with its server to communicate with anyone, anywhere. A server could run a proxy that lets the applet communicate indirectly with anyone it likes. What the current security policy prevents is malicious applets roaming around inside corporate firewalls. It places the burden of security on the originating server, and not the client machine. Restricting access to the originating server limits the usefulness of "trojan" applications that do annoying things from the client side. You won't let your proxy mail bomb people, because you'll be blamed.

The DateAtHost Client

Many networked workstations run a time service that dispenses their local clock time on a well-known port. This was a precursor of NTP, the more general Network Time Protocol. In the next example, DateAtHost, we'll make a specialized subclass of java.util.Date that fetches the time from a remote host instead of initializing itself from the local clock. (See Chapter 7, Basic Utility Classes for a complete discussion of the Date class.)

DateAtHost connects to the time service (port 37) and reads four bytes representing the time on the remote host. These four bytes are interpreted as an integer representing the number of seconds since the turn of the century. DateAtHost converts this to Java's variant of the absolute time (milliseconds since January 1, 1970, a date that should be familiar to UNIX users) and then uses the remote host's time to initialize itself:

import java.net.Socket; 
import java.io.*; 
  
public class DateAtHost extends java.util.Date { 
    static int timePort = 37;            
    static final long offset = 2208988800L; // Seconds from century to  
                                            // Jan 1, 1970 00:00 GMT 
  
    public DateAtHost( String host ) throws IOException { 
        this( host, timePort ); 
    } 
 
    public DateAtHost( String host, int port ) throws IOException { 
        Socket sock = new Socket( host, port ); 
        DataInputStream din = 
            new DataInputStream(sock.getInputStream()); 
        int time = din.readInt(); 
        sock.close(); 
  
        setTime( (((1L << 32) + time) - offset) * 1000 ); 
    } } 

That's all there is to it. It's not very long, even with a few frills. We have supplied two possible constructors for DateAtHost. Normally we'll use the first, which simply takes the name of the remote host as an argument. The second, overloaded constructor specifies the hostname and the port number of the remote time service. (If the time service were running on a nonstandard port, we would use the second constructor to specify the alternate port number.) This second constructor does the work of making the connection and setting the time. The first constructor simply invokes the second (using the this() construct) with the default port as an argument. Supplying simplified constructors that invoke their siblings with default arguments is a common and useful technique.

The second constructor opens a socket to the specified port on the remote host. It creates a DataInputStream to wrap the input stream and then reads a 4-byte integer using the readInt() method. It's no coincidence the bytes are in the right order. Java's DataInputStream and DataOutputStream classes work with the bytes of integer types in network byte order (most significant to least significant). The time protocol (and other standard network protocols that deal with binary data) also uses the network byte order, so we don't need to call any conversion routines. (Explicit data conversions would probably be necessary if we were using a nonstandard protocol, especially when talking to a non-Java client or server.) After reading the data, we're finished with the socket, so we close it, terminating the connection to the server. Finally, the constructor initializes the rest of the object by calling Date's setTime() method with the calculated time value.[2]

[2] The conversion first creates a long value, which is the unsigned equivalent of the integer time. It subtracts an offset to make the time relative to the epoch (January 1, 1970) rather than the century, and multiples by 1000 to convert to milliseconds.

The DateAtHost class can work with a time retrieved from a remote host almost as easily as Date is used with the time on the local host. The only additional overhead is that we have to deal with the possible IOException that can be thrown by the DateAtHost constructor:

try { 
    Date d = new DateAtHost( "sura.net" ); 
    System.out.println( "The time over there is: " + d ); 
    int hours = d.getHours(); 
    int minutes = d.getMinutes(); 
    ... 
}  
catch ( IOException e ) { } 

This example fetches the time at the host sura.net and prints its value. It then looks at some components of the time using the getHours() and getMinutes() methods of the Date class.

The TinyHttpd Server

Have you ever wanted your very own Web server? Well, you're in luck. In this section, we're going to build TinyHttpd, a minimal but functional HTTP daemon. TinyHttpd listens on a specified port and services simple HTTP "get file" requests. They look something like this:

GET /path/filename [optional stuff] 

Your Web browser sends one or more as lines for each document it retrieves. Upon reading the request, the server tries to open the specified file and send its contents. If that document contains references to images or other items to be displayed inline, the browser continues with additional GET requests. For best performance (especially in a time-slicing environment), TinyHttpd services each request in its own thread. Therefore, TinyHttpd can service several requests concurrently.

Over and above the limitations imposed by its simplicity, TinyHttpd suffers from the limitations imposed by the fickleness of filesystem access, as discussed in Chapter 8, Input/Output Facilities. It's important to remember that file pathnames are still architecture dependent--as is the concept of a filesystem to begin with. This example should work, as is, on UNIX and DOS-like systems, but may require some customizations to account for differences on other platforms. It's possible to write more elaborate code that uses the environmental information provided by Java to tailor itself to the local system. (Chapter 8, Input/Output Facilities gives some hints about how to do this).

WARNING:

This example will serve files from your host without protection. Don't try this at work.

Now, without further ado, here's TinyHttpd:

import java.net.*; 
import java.io.*; 
import java.util.*; 
 
public class TinyHttpd {  
    public static void main( String argv[] ) throws IOException { 
        ServerSocket ss = new ServerSocket(Integer.parseInt(argv[0])); 
        while ( true ) 
            new TinyHttpdConnection( ss.accept() ); 
    } 
} 
 
class TinyHttpdConnection extends Thread { 
    Socket sock; 
    TinyHttpdConnection ( Socket s ) { 
        sock = s; 
        setPriority( NORM_PRIORITY - 1 ); 
        start(); 
    } 
 
    public void run() { 
        try { 
            OutputStream out = sock.getOutputStream(); 
            String req = 
                new DataInputStream(sock.getInputStream()).readLine(); 
            System.out.println( "Request: "+req ); 
 
            StringTokenizer st = new StringTokenizer( req ); 
            if ( (st.countTokens() >= 2) && 
                  st.nextToken().equals("GET") ) { 
                if ( (req = st.nextToken()).startsWith("/") ) 
                    req = req.substring( 1 ); 
                if ( req.endsWith("/") || req.equals("") )
                    req = req + "index.html";
 
                try {    
                    FileInputStream fis = new FileInputStream ( req ); 
                    byte [] data = new byte [ fis.available() ]; 
                    fis.read( data ); 
                    out.write( data ); 
                }  
                catch ( FileNotFoundException e ) 
                    new PrintStream( out ).println("404 Not Found"); 
            } else  
                new PrintStream( out ).println( "400 Bad Request" ); 
 
            sock.close(); 
        }  
        catch ( IOException e ) 
            System.out.println( "I/O error " + e ); 
    } 
} 

Compile TinyHttpd and place it in your class path. Go to a directory with some interesting documents and start the daemon, specifying an unused port number as an argument. For example:

% java TinyHttpd 1234

You should now be able to use your Web browser to retrieve files from your host. You'll have to specify the nonstandard port number in the URL. For example, if your hostname is foo.bar.com, and you started the server as above, you could reference a file as in:

http://foo.bar.com:1234/welcome.html 

TinyHttpd looks for files relative to its current directory, so the pathnames you provide should be relative to that location. Retrieved some files? Al'righty then, let's take a closer look.

TinyHttpd is comprised of two classes. The public TinyHttpd class contains the main() method of our standalone application. It begins by creating a ServerSocket, attached to the specified port. It then loops, waiting for client connections and creating instances of the second class, a TinyHttpdConnection thread, to service each request. The while loop waits for the ServerSocket accept() method to return a new Socket for each client connection. The Socket is passed as an argument to construct the TinyHttpdConnection thread that handles it.

TinyHttpdConnection is a subclass of Thread. It lives long enough to process one client connection and then dies. TinyHttpdConnection's constructor does three things. After saving the Socket argument for its caller, it adjusts its own priority and then invokes start() to bring its run() method to life. By lowering its priority to NORM_PRIORITY-1 (just below the default priority), we ensure that the threads servicing established connections won't block TinyHttpd's main thread from accepting new requests. (On a time-slicing system, this is less important.)

The body of TinyHttpdConnection's run() method is where all the magic happens. First, we fetch an OutputStream for talking back to our client. The second line reads the GET request from the InputStream into the variable req. This request is a single newline-terminated String that looks like the GET request we described earlier. Since this is the only time we read from this socket, it's hard to resist the urge to be terse. Alternatively, we could break that statement into three steps: getting the InputStream, creating the DataInputStream wrapper, and reading the line. The three-line version is certainly more readable and should not be noticeably slower.

We then parse the contents of req to extract a filename. The next few lines are a brief exercise in string manipulation. We create a StringTokenizer and make sure there are at least two tokens. Using nextToken(), we take the first token and make sure it's the word GET. (If both conditions aren't met, we have an error.) Then we take the next token (which should be a filename), assign it to req , and check whether it begins with "/". If so, we use substring() to strip the first character, giving us a filename relative to the current directory. If it doesn't begin with "/", the filename is already relative to the current directory. Finally, we check to see if the requested filename looks like a directory name (i.e., ends in slash) or is empty. In these cases, we append the familiar default filename index.html.

Once we have the filename, we try to open the specified file and load its contents into a large byte array. (We did something similar in the ListIt example in Chapter 8, Input/Output Facilities.) If all goes well, we write the data out to client on the OutputStream. If we can't parse the request or the file doesn't exist, we wrap our OutputStream with a PrintStream to make it easier to send a textual message. Then we return an appropriate HTTP error message. Finally, we close the socket and return from run(), removing our Thread.

Taming the daemon

The biggest problem with TinyHttpd is that there are no restrictions on the files it can access. With a little trickery, the daemon will happily send any file in your filesystem to the client. It would be nice if we could restrict TinyHttpd to files that are in the current directory, or a subdirectory. To make the daemon safer, let's add a security manager. I discussed the general framework for security managers in Chapter 7, Basic Utility Classes. Normally, a security manager is used to prevent Java code downloaded over the Net from doing anything suspicious. However, a security manager will serve nicely to restrict file access in a self-contained application.

Here's the code for the security manager class:

import java.io.*; 
 
class TinyHttpdSecurityManager extends SecurityManager {  
 
    public void checkAccess(Thread g) { }; 
    public void checkListen(int port) { }; 
    public void checkLink(String lib) { }; 
    public void checkPropertyAccess(String key) { }; 
    public void checkAccept(String host, int port) { }; 
    public void checkWrite(FileDescriptor fd) { }; 
    public void checkRead(FileDescriptor fd) { }; 
 
    public void checkRead( String s ) {  
        if ( new File(s).isAbsolute() || (s.indexOf("..") != -1) ) 
            throw new 
               SecurityException("Access to file : "+s+" denied."); 
    }  
}  

The heart of this security manager is the checkRead() method. It checks two things: it makes sure that the pathname we've been given isn't an absolute path, which could name any file in the filesystem; and it makes sure the pathname doesn't have a double dot (..) in it, which refers to the parent of the current directory. With these two restrictions, we can be sure (at least on a UNIX or DOS-like filesystem) that we have restricted access to only subdirectories of the current directory. If the pathname is absolute or contains "..", checkRead() throws a SecurityException.

The other do-nothing method implementations--e.g., checkAccess()--allow the daemon to do its work without interference from the security manager. If we don't install a security manager, the application runs with no restrictions. However, as soon as we install any security manager, we inherit implementations of many "check" routines. The default implementations won't let you do anything; they just throw a security exception as soon as they are called. We have to open holes so the daemon can do its own work; it still has to accept connections, listen on sockets, create threads, read property lists, etc. Therefore, we override the default checks with routines that allow these things.

Now you're thinking, isn't that overly permissive? Not for this application; after all, TinyHttpd never tries to load foreign classes from the Net. The only code we are executing is our own, and it's assumed we won't do anything dangerous. If we were planning to execute untrusted code, the security manager would have to be more careful about what to permit.

Now that we have a security manager, we must modify TinyHttpd to use it. Two changes are necessary: we must install the security manager and catch the security exceptions it generates. To install the security manager, add the following code at the beginning of TinyHttpd's main() method:

System.setSecurityManager( new TinyHttpdSecurityManager() ); 

To catch the security exception, add the following catch clause after FileNotFoundException's catch clause:

catch ( SecurityException e ) 
    new PrintStream( out ).println( "403 Forbidden" ); 

Now the daemon can't access anything that isn't within the current directory or a subdirectory. If it tries to, the security manager throws an exception and prevents access to the file. The daemon then returns a standard HTTP error message to the client.

TinyHttpd still has room for improvement. First, it consumes a lot of memory by allocating a huge array to read the entire contents of the file all at once. A more realistic implementation would use a buffer and send large amounts of data in several passes. TinyHttpd also fails to deal with simple things like directories. It wouldn't be hard to add a few lines of code (again, refer to the ListIt example in Chapter 8, Input/Output Facilities) to read a directory and generate linked HTML listings like most Web servers do.

9.2 Datagram Sockets

TinyHttpd used a Socket to create a connection to the client using the TCP protocol. In that example, TCP itself took care of data integrity; we didn't have to worry about data arriving out of order or incorrect. Now we'll take a walk on the wild side. We'll build an applet that uses a java.net.DatagramSocket, which uses the UDP protocol. A datagram is sort of like a "data telegram": it's a discrete chunk of data transmitted in one packet. Unlike the previous example, where we could get a convenient OutputStream from our Socket and write the data as if writing to a file, with a DatagramSocket we have to work one datagram at a time. (Of course, the TCP protocol was taking our OutputStream and slicing the data into packets, but we didn't have to worry about those details).

UDP doesn't guarantee that the data will get through. If the data do get through, it may not arrive in the right order; it's even possible for duplicate datagrams to arrive. Using UDP is something like cutting the pages out of the encyclopedia, putting them into separate envelopes, and mailing them to your friend. If your friend wants to read the encyclopedia, it's his or her job to put the pages in order. If some pages got lost in the mail, your friend has to send you a letter asking for replacements.

Obviously, you wouldn't use UDP to send a huge amount of data. But it's significantly more efficient than TCP, particularly if you don't care about the order in which messages arrive, or whether the data arrive at all. For example, in a database lookup, the client can send a query; the server's response itself constitutes an acknowledgment. If the response doesn't arrive within a certain time, the client can send another query. It shouldn't be hard for the client to match responses to its original queries. Some important applications that use UDP are the Domain Name System (DNS) and Sun's Network Filesystem (NFS).

The HeartBeat Applet

In this section we'll build a simple applet, HeartBeat, that sends a datagram to its server each time it's started and stopped. (See Chapter 10, Understand the Abstract Windowing Toolkit for a complete discussion of the Applet class.) We'll also build a simple standalone server application, Pulse, that receives that datagrams and prints them. By tracking the output, you could have a crude measure of who is currently looking at your Web page at any given time. This is an ideal application for UDP: we don't want the overhead of a TCP socket, and if datagrams get lost, it's no big deal.

First, the HeartBeat applet:

import java.net.*; 
import java.io.*; 
 
public class HeartBeat extends java.applet.Applet { 
    String myHost; 
    int myPort; 
 
    public void init() { 
        myHost = getCodeBase().getHost(); 
        myPort = Integer.parseInt( getParameter("myPort") ); 
    } 
 
    private void sendMessage( String message ) { 
        try { 
            byte [] data = new byte [ message.length() ]; 
            message.getBytes(0, data.length, data, 0); 
            InetAddress addr = InetAddress.getByName( myHost ); 
            DatagramPacket pack = 
               new DatagramPacket(data, data.length, addr, myPort); 
 
            DatagramSocket ds = new DatagramSocket(); 
            ds.send( pack ); 
            ds.close(); 
        }  
        catch ( IOException e )    
            System.out.println( e ); 
    } 
 
    public void start() { 
        sendMessage("Arrived"); 
    } 
    public void stop() { 
        sendMessage("Departed"); 
    } 
} 

Compile the applet and include it in an HTML document with an <applet> tag:

<applet height=10 width=10 code=HeartBeat>  
    <param name="myPort" value="1234"> 
</applet> 

The myPort parameter should specify the port number on which our server application listens for data.

Next, the server-side application, Pulse:

import java.net.*; 
import java.io.*; 
 
public class Pulse { 
    public static void main( String [] argv ) throws IOException { 
 
        DatagramSocket s = 
           new DatagramSocket(Integer.parseInt(argv[0])); 
        while ( true ) { 
            DatagramPacket packet = new DatagramPacket(new byte
                                                      [1024], 1024); 
            s.receive( packet ); 
            String message = new String(packet.getData(), 0, 0, 
                                        packet.getLength()); 
            System.out.println( "Heartbeat from: " +  
                packet.getAddress().getHostName() + " - " + message ); 
        } 
    } 
} 

Compile Pulse and run it on your Web server, specifying a port number as an argument:

% java Pulse 1234

The port number should be the same as the one you used in the myPort parameter of the <applet> tag for HeartBeat.

Now, pull up the Web page in your browser. You won't see anything there (a better application might do something visual as well), but you should get a blip from the Pulse application. Leave the page and return to it a few times. Each time the applet is started or stopped, it sends a message:

Heartbeat from: foo.bar.com - Arrived 
Heartbeat from: foo.bar.com - Departed 
Heartbeat from: foo.bar.com - Arrived 
Heartbeat from: foo.bar.com - Departed 
... 

Cool, eh? Just remember the datagrams are not guaranteed to arrive (although it's unlikely you'll see them fail), and it's possible that you could miss an arrival or a departure. Now let's look at the code.

HeartBeat

HeartBeat overrides the init(), start(), and stop() methods of the Applet class, and implements one private method of its own, sendMessage(), that sends a datagram. HeartBeat begins its life in init(), where it determines the destination for its messages. It uses the Applet getCodeBase() and getHost() methods to find the name of its originating host and fetches the correct port number from the myPort parameter of the HTML tag. After init() has finished, the start() and stop() methods are called whenever the applet is started or stopped. These methods merely call sendMessage() with the appropriate message.

sendMessage() is responsible for sending a String message to the server as a datagram. It takes the text as an argument, constructs a datagram packet containing the message, and then sends the datagram. All of the datagram information, including the destination and port number, are packed into a java.net.DatagramPacket object. The DatagramPacket is like an addressed envelope, stuffed with our bytes. After the DatagramPacket is created, sendMessage() simply has to open a DatagramSocket and send it.

The first four lines of sendMessage() build the DatagramPacket:

try { 
    byte [] data = new byte [ message.length() ]; 
    message.getBytes(0, data.length, data, 0); 
    InetAddress addr = InetAddress.getByName( myHost ); 
    DatagramPacket pack = 
       new DatagramPacket(data, data.length, addr, myPort ); 

First, the contents of message are placed into an array of bytes called data. Next a java.net.InetAddress object is created from the name myHost. An InetAddress simply holds the network address information for a host in a special format. We get an InetAddress object for our host by using the static getByName() method of the InetAddress class. (We can't construct an InetAddress object directly.) Finally, we call the DatagramPacket constructor with four arguments: the byte array containing our data, the length of the data, the destination address object, and the port number.

The remaining lines construct a default client DatagramSocket and call its send() method to transmit the DatagramPacket; after sending the datagram, we close the socket:

DatagramSocket ds = new DatagramSocket(); 
ds.send( pack ); 
ds.close(); 

Two operations throw a type of IOException: the InetAddress.getByName() lookup and the DatagramSocket send(). InetAddress.getByName() can throw an UnknownHostException, which is a type of IOException that indicates that the host name can't be resolved. If send() throws an IOException, it implies a serious client side problem in talking to the network. We need to catch these exceptions; our catch block simply prints a message telling us that something went wrong. If we get one of these exceptions, we can assume the datagram never arrived. However, we can't assume the converse. Even if we don't get an exception, we still don't know that the host is actually accessible or that the data actually arrived; with a DatagramSocket, we never find out.

Pulse

The Pulse server corresponds to the HeartBeat applet. First, it creates a DatagramSocket to listen on our prearranged port. This time, we specify a port number in the constructor; we get the port number from the command line as a string (argv[0]) and convert it to an integer with Integer.parseInt(). Note the difference between this call to the constructor and the call in HeartBeat. In the server, we need to listen for incoming datagrams on a prearranged port, so we need to specify the port when creating the DatagramSocket. In the client, we need only to send datagrams, so we don't have to specify the port in advance; we build the port number into the DatagramPacket itself.

Second, Pulse creates an empty DatagramPacket of a fixed size to receive an incoming datagram. This alternative constructor for DatagramPacket takes a byte array and a length as arguments. As much data as possible is stored in the byte array when it's received. (A practical limit on the size of a UDP datagram is 8K.) Finally, Pulse calls the DatagramSocket's receive() method to wait for a packet to arrive. When a packet arrives, its contents are printed.

As you can see, working with DatagramSocket is slightly more tedious than working with Sockets. With datagrams, it's harder to spackle over the messiness of the socket interface. However, the Java API rather slavishly follows the UNIX interface, and that doesn't help. I don't see any reason why we have to prepare a datagram to hand to receive() (at least for the current functionality); receive() ought to create an appropriate object on its own and hand it to us, saving us the effort of building the datagram in advance and unpacking the data from it afterwards. It's easy to imagine other conveniences; perhaps we'll have them in a future release.

9.3 Working with URLs

A URL points to an object on the Internet. It's a collection of information that identifies an item, tells you where to find it, and specifies a method for communicating with it or retrieving it from its source. A URL refers to any kind of information source. It might point to static data, such as a file on a local filesystem, a Web server, or an FTP archive; or it can point to a more dynamic object such as a news article on a news spool or a record in a WAIS database. URLs can even refer to less tangible resources such as Telnet sessions and mailing addresses.

A URL is usually presented as a string of text, like an address.[3] Since there are many different ways to locate an item on the Net, and different mediums and transports require different kinds of information, there are different formats for different kinds of URLs. The most common form specifies three things: a network host or server, the name of the item and its location on that host, and a protocol by which the host should communicate:

[3] The term URL was coined by the Uniform Resource Identifier (URI) working group of the IETF to distinguish URLs from the more general notion of Uniform Resource Names or URNs. URLs are really just static addresses, whereas URNs would be more persistent and abstract identifiers used to resolve the location of an object anywhere on the Net. URLs are defined in RFC 1738 and RFC 1808.

protocol://hostname/location/item

protocol is an identifier such as "http," "ftp," or "gopher"; hostname is an Internet hostname; and the location and item components form a path that identifies the object on that host. Variants of this form allow extra information to be packed into the URL, specifying things like port numbers for the communications protocol and fragment identifiers that reference parts inside the object.

We sometimes speak of a URL that is relative to a base URL. In that case we are using the base URL as a starting point and supplying additional information. For example, the base URL might point to a directory on a Web server; a relative URL might name a particular file in that directory.

The URL class

A URL is represented by an instance of the java.net.URL class. A URL object manages all information in a URL string and provides methods for retrieving the object it identifies. We can construct a URL object from a URL specification string or from its component parts:

try { 
    URL aDoc = new URL( "http://foo.bar.com/documents/homepage.html" ); 
    URL sameDoc = 
        new URL("http","foo.bar.com","documents/homepage.html"); 
}  
catch ( MalformedURLException e ) { } 

The two URL objects above point to the same network resource, the homepage.html document on the server foo.bar.com. Whether or not the resource actually exists and is available isn't known until we try to access it. At this point, the URL object just contains data about the object's location and how to access it. No connection to the server has been made. We can examine the URL's components with the getProtocol(), getHost(), and getFile() methods. We can also compare it to another URL with the sameFile() method. sameFile() determines if two URLs point to the same resource. It can be fooled, but sameFile does more than compare the URLs for equality; it takes into account the possibility that one server may have several names, and other factors.

When a URL is created, its specification is parsed to identify the protocol component. If the protocol doesn't make sense, or if Java can't find a protocol handler for it, the URL constructor throws a MalformedURLException. A protocol handler is a Java class that implements the communications protocol for accessing the URL resource. For example, given an "http" URL, Java prepares to use the HTTP protocol handler to retrieve documents from the specified server.

Stream Data

The most general way to get data back from URL is to ask for an InputStream from the URL by calling openStream(). If you're writing an applet that will be running under Netscape, this is about your only choice. In fact, it's a good choice if you want to receive continuous updates from a dynamic information source. The drawback is that you have to parse the contents of an object yourself. Not all types of URLs support the openStream() method; you'll get an UnknownServiceException if yours doesn't.

The following code reads a single line from an HTML file:

try { 
    URL url = new URL("http://server/index.html"); 
    DataInputStream dis = new DataInputStream( url.openStream() ); 
    String line = dis.readLine(); 

We ask for an InputStream with openStream(), and wrap it in a DataInputStream to read a line of text. Here, because we are specifying the "http" protocol in the URL, we still require the services of an HTTP protocol handler. As we'll discuss more in a bit, that brings up some questions about what handlers we have available to us and where. This example partially works around those issues because no content handler is involved; we read the data and interpret it as a content handler would. However, there are even more limitations on what applets can do right now. For the time being, if you construct URLs relative to the applet's codeBase(), you should be able to use them in applets as in the above example. This should guarantee that the needed protocol is available and accessible to the applet. Again, we'll discuss the more general issues a bit later.

Getting the Content as an Object

openStream() operates at a lower level than the more general content-handling mechanism implemented by the URL class. We showed it first because, until some things are settled, you'll be limited as to when you can use URLs in their more powerful role. When a proper content handler is available to Java (currently, only if you supply one with your standalone application), you'll be able to retrieve the object the URL addresses as a complete object, by calling the URL's getContent() method. getContent() initiates a connection to the host, fetches the data for you, determines the data's MIME type, and invokes a content handler to turn the data into a Java object.

For example: given the URL http://foo.bar.com/index.html, a call to getContent() uses the HTTP protocol handler to receive the data and the HTML content handler to turn the data into some kind of object. A URL that points to a plain-text file would use a text-content handler that might return a String object. A GIF file might be turned into an Image object for display, using a GIF content handler. If we accessed the GIF file using an "ftp" URL, Java would use the same content handler, but would use the FTP protocol handler to receive the data.

getContent() returns the output of the content handler. Now we're faced with a problem: exactly what did we get? Since the content handler can return almost anything, the return type of getContent() is Object. Before doing anything meaningful with this Object, we must cast it into some other data type that we can work with. For example, if we expect a String, we'll cast the result of getContent() to a String:

String content; 
 
try  
    content = (String)myURL.getContent(); 
catch ( Exception e ) { } 

Of course, we are presuming we will in fact get a String object back from this URL. If we're wrong, we'll get a ClassCastException. Since it's common for servers to be confused (or even lie) about the MIME types of the objects they serve, it's wise to catch that exception (it's a subclass of RuntimeException, so catching it is optional) or to check the type of the returned object with the instanceof operator:

if ( content instanceof String ) { 
    String s = (String)content; 
    ... 

Various kinds of errors can occur when trying to retrieve the data. For example, getContent() can throw an IOException if there is a communications error; IOException is not a type of RuntimeException, so we must catch it explicitly, or declare the method that calls getContent() can throw it. Other kinds of errors can happen at the application level: some knowledge of how the handlers deal with errors is necessary.

For example, consider a URL that refers to a nonexistent file on an HTTP server. When requested, the server probably returns a valid HTML document that consists of the familiar "404 Not Found" message. An appropriate HTML content handler is invoked to interpret this and return it as it would any other HTML object. At this point, there are several alternatives, depending entirely on the content handler's implementation. It might return a String containing the error message; it could also conceivably return some other kind of object or throw a specialized subclass of IOException. To find out that an error occurred, the application may have to look directly at the object returned from getContent(). After all, what is an error to the application may not be an error as far as the protocol or content handlers are concerned. "404 Not Found" isn't an error at this level; it's a perfectly valid document.

Another type of error occurs if a content handler that understands the data's MIME type isn't available. In this case, getContent() invokes a minimal content handler used for data with an unknown type and returns the data as a raw InputStream. A sophisticated application might specialize this behavior to try to decide what to do with the data on its own.

The openStream() and getContent() methods both implicitly create a connection to the remote URL object. For some applications, it may be necessary to use the openConnection() method of the URL to interact directly with the protocol handler. openConnection() returns a URLConnection object, which represents a single, active connection to the URL resource. We'll examine URLConnections further when we start writing protocol handlers.